Single Precision Floating Point Unit
Mani Sudha Yalamanchi (sudhadimpy@yahoo.com)
Rajendar Koltur (rkoltur@yahoo.com)
Acknowledgements
Our work is based upon the openipcore Floating Point Unit Core designed
and coded in verilog by Rudolf Usseslman. We are indebted to him
and www.opencores.org for providing the source code. We also thank Raghu
Yerramreddikalva of Intel Corporation for helping us with the synthesis
of the floating point unit. With his help, we were able to synthesize
our code using Synopsys and a real target library.
Objective
The objective of this project is to design a Single precision floating
point unit core using VHDL, simulate it and synthesize it. Currently
the core supports only floating point addition, subtraction and multiplication.
Methodology
We started off by studying Computer Arithmetic in Reference 2. Next,
we read the IEEE standard 754 on binary floating point arithmetic[6].
Reference 5 listed a number of algorithms for high performance floating
point arithmetic. To handle the complextity, we leverage of an existing
design in Verilog. We rewrote the code using VHDL, learning a lot
about both the languages in the process. We designed out own testbench
and in addition used the testing methodology adopted by the opencores desgin
and reran their tests. Finally we synthesized the design using a
real ASIC library and wire load model.
Introduction
The FPU core has the following features.
Implements Single Precision (32-bit).
Implements Floating point addition, subtraction and multiplication
Implements all four rounding modes, round to nearest, round towards +inf,
round towards -inf and round to zero.
All exceptions implemented and reported according to the IEEE standard.
Entity
Signal Descriptions
Inputs:
clk |
clock |
opa, opb |
input operands A and B |
rmode |
rounding mode (00-round to nearest even, 01-round to zero, 10-round
to +inf, 11-round to -inf) |
fpu_op |
floating point operation (0 - Add, 1 - Subtract, 2 - Multiply, 3 -
Divide, 4 - Int to float conversion, 5 - Float to int conversion) |
Outputs:
fpout |
result output |
inf |
Asserted when output is the special value INF |
ine |
Asserted when the calulation is inexact, i.e. some accuracy has been
lost during computation |
overflow |
Asserted when a overflow occurs, i.e. number is too large to be represented. |
underflow |
Asserted when a Underflow occurs, i.e. number is too small to be represented. |
div_by_zero |
Asserted when the fpu_op is set to divide and opb is zero |
zero |
Asserted when the output is a numeric zero |
snan |
Asserted when either operand is a SNAN |
qnan |
Output Asserted when output is a QNAN |
Floating Point Multiplication Alogorithm
The algorithm for Floating Point Multiplication consists of the following
steps.
Check for zeros, NaN's, inf on inputs.
Add the exponents
Multiply the mantissas
Normalize the product and round using the specified rounding mode.
Also generate exceptions.
Floating Point Addition Algorithm
The algorithm for Floating Point Addition/Subtraction consists of the following
steps.
Check for special values on inputs.
Align the mantissas, i.e right shift the significand of the smaller operand
by d bits.
Add or subtract the mantissas
Normalize the result and round using tje specified rounding mode. Also
generate exceptions.
Microarchitecture
The FPU core consists of the following units.
Pre Normalize Block for Add/Subtract.- Calculate the diference between
the smaller and larger exponent. Adjust the smaller fraction by right shifting
it, determine if the operation is an add or subtract after resolving the
sign bits. Check for NaNs on inputs.
Pre Normalize Block for Mul/Div - Computes the sum/difference of exponents,
checks for exponent overflow, underflow condition and INF value on an input.
Add/Sub - 24 bit integer adder/subtractor.
Multiply - 2 cycle 24-bit boolean integer multipler
Divide - 2 cycle integer divider and remainder computation unit
Post Normalize and Round Unit - Normalize fraction and exponent. Also
do all the roundings in parallel and then pick the output corresponding
to the chosen rounding mode.
Exceptions Unit - logic to stage and generate exception signals.
The block diagram is reproduced from Reference 1 and is given below.
Datapath and Pipeline
Code
pre_norm_arch.vhd
pre_norm_fmul_arch.vhd
add_sub27_arch.vhd
mul_r2_arch.vhd
div_r2_arch.vhd
post_norm_arch.vhd
except_arch.vhd
fpu_arch.vhd
Simulation
We used two strategies for testing. The first and simpler one was
to write a test bench that exercised various features of the design.
The floating point calculations are output to a file in an easy to read
format with the hidden values recovered. This allowed us to hand
verify the calculations.
A sample output file is given here.
Click here for the testbench
and related
package file.
The second strategy is the one used by the Opencores design. The
testing is much more extensive. A software implementation of the
IEEE standard, Softfloat, is used as the reference design.
This library is used to generate millions of test vectors that are
stored in a file. A test can be a random test that selects the operation
and rounding at random or focussed tests that test a fpu operation and
rounding mode. The packed vectors are read from the test file and
compared with the output produced by the fpu core. The model was simulated
using Modelsim. All vectors that passed on the Verilog model, also
passed on the VHDL model.
Synthesis
The use of operators made our code compact and simple. We found that
most of the operators except the division(/) and remainder(rem) operators
were synthesizable. We synthesized our design using a ASIC
library at Intel Corporation with Synopsys Design Compiler and the DesignWare.
Note that the integer divider and remainder code was disabled during synthesis.
The first step is to analyze and elaborate the design. This will
ensure that there are no errors in the vhdl and prepare an unoptimized
netlist.
dc_shell> define_design_lib work -path ../mra
dc_shell> analyze -library work -format vhdl ../hdl/add_sub27_arch.vhd
dc_shell> analyze -library work -format vhdl ../hdl/div_r2_arch.vhd
dc_shell> analyze -library work -format vhdl ../hdl/mul_r2_arch.vhd
dc_shell> analyze -library work -format vhdl ../hdl/except_arch.vhd
dc_shell> analyze -library work -format vhdl ../hdl/pre_norm_fmul_arch.vhd
dc_shell> analyze -library work -format vhdl ../hdl/pre_norm_arch.vhd
dc_shell> analyze -library work -format vhdl ../hdl/post_norm_arch.vhd
dc_shell> analyze -library work -format vhdl ../hdl/fpu_arch.vhd
dc_shell> elaborate -library work fpu
dc_shell> write -format db -hier -output fpu_pre.db
Next we set the wire load model, operating conditions and define the
clock. Finally the model is compiled to the target library.
The synthesis script is given here.
Some of the questions we answered were.
1. Upto what frequency can the fpu operate?
The FPU can operate at a frequency of 100 MHz.
2. What are number of library cells used.
A total of 6871 cells were used.
3. What are the critical paths?
The slowest timing paths are part of the post normalization unit, espically
the generation of the underflow and overflow signals.
Some of the reports about the synthesis are:
- fpu_check_design.rpt..> 08-Jul-2001 18:59 11k
- fpu_compile_scr.txt 08-Jul-2001 18:59 2k
- fpu_constraints_viol..> 08-Jul-2001 18:59 96k
- fpu_loop.rpt.txt 08-Jul-2001 18:59 1k
- fpu_timing.rpt.txt 08-Jul-2001 18:59 179k
Conclusions
We learned a lot in this project. We learned VHDL and Verilog coding
and syntax, Floating Point Unit micro architecture, Floating Point
Addition, Multiplication and Division algorithms, the IEEE standard for
Binary Floating-Point Arithmetic, issues in design including pipelining,
Verification Strategies and Synthesis.
References
1. Rudolf Usselman, Documentation for Floating Point Unit, http://www.opencores.org.
2. John L. Hennessy and David A. Patterson, Computer Architecture A
Quantitative Approach, 2nd Edition, Morgan Kaufmann, Appendix A.
3. Peter J. Ashenden, The Designer's Guide to VHDL, Morgan Kaufmann.
4. Donald E. Thomas and Philip R. Moorby, The Verilog Hardware Description
Language, Kluwer Academic Publishers.
5. Stuart Oberman, “Design Issues in High Performance Floating-Point
Arithmetic Units”, Stanford University Technical report.
6. IEEE, IEEE-754-1985 Standard for binary floating-point arithmetic.