Single Precision Floating Point Unit

Mani Sudha Yalamanchi (sudhadimpy@yahoo.com)
Rajendar Koltur (rkoltur@yahoo.com)

Acknowledgements

Our work is based upon the openipcore Floating Point Unit Core designed and coded in verilog by Rudolf Usseslman. We are indebted to him and www.opencores.org for providing the source code. We also thank Raghu Yerramreddikalva of Intel Corporation for helping us with the synthesis of the floating point unit. With his help, we were able to synthesize our code using Synopsys and a real target library.

Objective

The objective of this project is to design a Single precision floating point unit core using VHDL, simulate it and synthesize it. Currently the core supports only floating point addition, subtraction and multiplication.

Methodology

We started off by studying Computer Arithmetic in Reference 2. Next, we read the IEEE standard 754 on binary floating point arithmetic[6]. Reference 5 listed a number of algorithms for high performance floating point arithmetic. To handle the complextity, we leverage of an existing design in Verilog. We rewrote the code using VHDL, learning a lot about both the languages in the process. We designed out own testbench and in addition used the testing methodology adopted by the opencores desgin and reran their tests. Finally we synthesized the design using a real ASIC library and wire load model.

Introduction

The FPU core has the following features.

Implements Single Precision (32-bit).

Implements Floating point addition, subtraction and multiplication

Implements all four rounding modes, round to nearest, round towards +inf, round towards -inf and round to zero.

All exceptions implemented and reported according to the IEEE standard.

Entity

Signal Descriptions

Inputs:

clk	clock
opa, opb	input operands A and B
rmode	rounding mode (00-round to nearest even, 01-round to zero, 10-round to +inf, 11-round to -inf)
fpu_op	floating point operation (0 - Add, 1 - Subtract, 2 - Multiply, 3 - Divide, 4 - Int to float conversion, 5 - Float to int conversion)

Outputs:

fpout result output

inf Asserted when output is the special value INF

ine Asserted when the calulation is inexact, i.e. some accuracy has been lost during computation

overflow Asserted when a overflow occurs, i.e. number is too large to be represented.

underflow Asserted when a Underflow occurs, i.e. number is too small to be represented.

div_by_zero Asserted when the fpu_op is set to divide and opb is zero

zero Asserted when the output is a numeric zero

snan Asserted when either operand is a SNAN

qnan Output Asserted when output is a QNAN

Floating Point Multiplication Alogorithm

The algorithm for Floating Point Multiplication consists of the following steps.

Check for zeros, NaN's, inf on inputs.

Add the exponents

Multiply the mantissas

Normalize the product and round using the specified rounding mode. Also generate exceptions.

Floating Point Addition Algorithm

The algorithm for Floating Point Addition/Subtraction consists of the following steps.

Check for special values on inputs.

Align the mantissas, i.e right shift the significand of the smaller operand by d bits.

Add or subtract the mantissas

Normalize the result and round using tje specified rounding mode. Also generate exceptions.

Microarchitecture

The FPU core consists of the following units.

Pre Normalize Block for Add/Subtract.- Calculate the diference between the smaller and larger exponent. Adjust the smaller fraction by right shifting it, determine if the operation is an add or subtract after resolving the sign bits. Check for NaNs on inputs.
Pre Normalize Block for Mul/Div - Computes the sum/difference of exponents, checks for exponent overflow, underflow condition and INF value on an input.
Add/Sub - 24 bit integer adder/subtractor.
Multiply - 2 cycle 24-bit boolean integer multipler
Divide - 2 cycle integer divider and remainder computation unit
Post Normalize and Round Unit - Normalize fraction and exponent. Also do all the roundings in parallel and then pick the output corresponding to the chosen rounding mode.
Exceptions Unit - logic to stage and generate exception signals.

The block diagram is reproduced from Reference 1 and is given below.

Datapath and Pipeline

Code

pre_norm_arch.vhd
pre_norm_fmul_arch.vhd
add_sub27_arch.vhd
mul_r2_arch.vhd
div_r2_arch.vhd
post_norm_arch.vhd
except_arch.vhd
fpu_arch.vhd

Simulation

We used two strategies for testing. The first and simpler one was to write a test bench that exercised various features of the design. The floating point calculations are output to a file in an easy to read format with the hidden values recovered. This allowed us to hand verify the calculations.
A sample output file is given here.

Click here for the testbench and related package file.

The second strategy is the one used by the Opencores design. The testing is much more extensive. A software implementation of the IEEE standard, Softfloat, is used as the reference design. This library is used to generate millions of test vectors that are stored in a file. A test can be a random test that selects the operation and rounding at random or focussed tests that test a fpu operation and rounding mode. The packed vectors are read from the test file and compared with the output produced by the fpu core. The model was simulated using Modelsim. All vectors that passed on the Verilog model, also passed on the VHDL model.

Synthesis

The use of operators made our code compact and simple. We found that most of the operators except the division(/) and remainder(rem) operators were synthesizable. We synthesized our design using a ASIC library at Intel Corporation with Synopsys Design Compiler and the DesignWare. Note that the integer divider and remainder code was disabled during synthesis.

The first step is to analyze and elaborate the design. This will ensure that there are no errors in the vhdl and prepare an unoptimized netlist.

dc_shell> define_design_lib work -path ../mra
dc_shell> analyze -library work -format vhdl ../hdl/add_sub27_arch.vhd
dc_shell> analyze -library work -format vhdl ../hdl/div_r2_arch.vhd
dc_shell> analyze -library work -format vhdl ../hdl/mul_r2_arch.vhd
dc_shell> analyze -library work -format vhdl ../hdl/except_arch.vhd
dc_shell> analyze -library work -format vhdl ../hdl/pre_norm_fmul_arch.vhd
dc_shell> analyze -library work -format vhdl ../hdl/pre_norm_arch.vhd
dc_shell> analyze -library work -format vhdl ../hdl/post_norm_arch.vhd
dc_shell> analyze -library work -format vhdl ../hdl/fpu_arch.vhd
dc_shell> elaborate -library work fpu
dc_shell> write -format db -hier -output fpu_pre.db

Next we set the wire load model, operating conditions and define the clock. Finally the model is compiled to the target library.
The synthesis script is given here.

Some of the questions we answered were.
1. Upto what frequency can the fpu operate?
The FPU can operate at a frequency of 100 MHz.

2. What are number of library cells used.
A total of 6871 cells were used.

3. What are the critical paths?
The slowest timing paths are part of the post normalization unit, espically the generation of the underflow and overflow signals.

Some of the reports about the synthesis are:

fpu_check_design.rpt..> 08-Jul-2001 18:59 11k
fpu_compile_scr.txt 08-Jul-2001 18:59 2k
fpu_constraints_viol..> 08-Jul-2001 18:59 96k
fpu_loop.rpt.txt 08-Jul-2001 18:59 1k
fpu_timing.rpt.txt 08-Jul-2001 18:59 179k

Conclusions We learned a lot in this project. We learned VHDL and Verilog coding and syntax, Floating Point Unit micro architecture, Floating Point Addition, Multiplication and Division algorithms, the IEEE standard for Binary Floating-Point Arithmetic, issues in design including pipelining, Verification Strategies and Synthesis.

References

1. Rudolf Usselman, Documentation for Floating Point Unit, http://www.opencores.org.
2. John L. Hennessy and David A. Patterson, Computer Architecture A Quantitative Approach, 2nd Edition, Morgan Kaufmann, Appendix A.
3. Peter J. Ashenden, The Designer's Guide to VHDL, Morgan Kaufmann.
4. Donald E. Thomas and Philip R. Moorby, The Verilog Hardware Description Language, Kluwer Academic Publishers.
5. Stuart Oberman, “Design Issues in High Performance Floating-Point Arithmetic Units”, Stanford University Technical report.
6. IEEE, IEEE-754-1985 Standard for binary floating-point arithmetic.

fpout	result output
inf	Asserted when output is the special value INF
ine	Asserted when the calulation is inexact, i.e. some accuracy has been lost during computation
overflow	Asserted when a overflow occurs, i.e. number is too large to be represented.
underflow	Asserted when a Underflow occurs, i.e. number is too small to be represented.
div_by_zero	Asserted when the fpu_op is set to divide and opb is zero
zero	Asserted when the output is a numeric zero
snan	Asserted when either operand is a SNAN
qnan	Output Asserted when output is a QNAN