Single Precision Floating Point Unit

Mani Sudha Yalamanchi (sudhadimpy@yahoo.com)
Rajendar Koltur (rkoltur@yahoo.com)

Acknowledgements

Our work is based upon the openipcore Floating Point Unit Core designed and coded in verilog by Rudolf Usseslman.  We are indebted to him and www.opencores.org for providing the source code. We also thank Raghu Yerramreddikalva of Intel Corporation  for helping us with the synthesis of the floating point unit.  With his help, we were able to synthesize our code using Synopsys and a real target library.

Objective

The objective of this project is to design a Single precision floating point unit core using VHDL, simulate it and synthesize it.  Currently the core supports only floating point addition, subtraction and multiplication.

Methodology

We started off by studying Computer Arithmetic in Reference 2.  Next, we read the IEEE standard 754 on binary floating point arithmetic[6].  Reference 5 listed a number of algorithms for high performance floating point arithmetic.  To handle the complextity, we leverage of an existing design in Verilog.  We rewrote the code using VHDL, learning a lot about both the languages in the process.  We designed out own testbench and in addition used the testing methodology adopted by the opencores desgin and reran their tests.  Finally we synthesized the design using a real ASIC library and wire load model.

Introduction

The FPU core has the following features.
  • Implements Single Precision (32-bit).
  • Implements Floating point addition, subtraction and multiplication
  • Implements all four rounding modes, round to nearest, round towards +inf, round towards -inf and round to zero.
  • All exceptions implemented and reported according to the IEEE standard.
  • Entity


     
     

    Signal Descriptions

    Inputs:
    clk  clock
    opa, opb  input operands A and B
    rmode  rounding mode (00-round to nearest even, 01-round to zero, 10-round to +inf, 11-round to -inf)
    fpu_op floating point operation (0 - Add, 1 - Subtract, 2 - Multiply, 3 - Divide, 4 - Int to float conversion, 5 - Float to int conversion)

    Outputs:
    fpout  result output
    inf Asserted when output is the special value INF
    ine Asserted when the calulation is inexact, i.e. some accuracy has been lost during computation
    overflow Asserted when a overflow occurs, i.e. number is too large to be represented.
    underflow Asserted when a Underflow occurs, i.e. number is too small to be represented.
    div_by_zero Asserted when the fpu_op is set to divide and opb is zero
    zero Asserted when the output is a numeric zero
    snan Asserted when either operand is a SNAN
    qnan Output Asserted when output is a QNAN

    Floating Point Multiplication Alogorithm

    The algorithm for Floating Point Multiplication consists of the following steps.
  • Check for zeros, NaN's, inf on inputs.
  • Add the exponents
  • Multiply the mantissas
  • Normalize the product and round using the specified rounding mode.  Also generate exceptions.

  •  

    Floating Point Addition Algorithm

    The algorithm for Floating Point Addition/Subtraction consists of the following steps.
  • Check for special values on inputs.
  • Align the mantissas, i.e right shift the significand of the smaller operand by d bits.
  • Add or subtract the mantissas
  • Normalize the result and round using tje specified rounding mode. Also generate exceptions.

  •  

    Microarchitecture

    The FPU core consists of the following units.

    Pre Normalize Block for Add/Subtract.- Calculate the diference between the smaller and larger exponent. Adjust the smaller fraction by right shifting it, determine if the operation is an add or subtract after resolving the sign bits.  Check for NaNs on inputs.
    Pre Normalize Block for Mul/Div - Computes the sum/difference of exponents, checks for exponent overflow, underflow condition and INF value on an input.
    Add/Sub -  24 bit integer adder/subtractor.
    Multiply - 2 cycle 24-bit boolean integer multipler
    Divide - 2 cycle integer divider and remainder computation unit
    Post Normalize and Round Unit - Normalize fraction and exponent. Also do all the roundings in parallel and then pick the output corresponding to the chosen rounding mode.
    Exceptions Unit - logic to stage and generate exception signals.

    The block diagram is reproduced from Reference 1 and is given below.


     

    Datapath and  Pipeline

    Code

    pre_norm_arch.vhd
    pre_norm_fmul_arch.vhd
    add_sub27_arch.vhd
    mul_r2_arch.vhd
    div_r2_arch.vhd
    post_norm_arch.vhd
    except_arch.vhd
    fpu_arch.vhd

    Simulation

    We used two strategies for testing.  The first and simpler one was to write a test bench that exercised various features of the design.  The floating point calculations are output to a file in an easy to read format with the hidden values recovered.  This allowed us to hand verify the calculations.
    A sample output file is given here.

    Click here for the testbench and related package file.

    The second strategy is the one used by the Opencores design.  The testing is much more extensive.  A software implementation of the IEEE standard, Softfloat,  is used as the reference design.  This  library is used to generate millions of test vectors that are stored in a file.  A test can be a random test that selects the operation and rounding at random or focussed tests that test a fpu operation and rounding mode.  The packed vectors are read from the test file and compared with the output produced by the fpu core. The model was simulated using Modelsim.  All vectors that passed on the Verilog model, also passed on the VHDL model.

    Synthesis

    The use of operators made our code compact and simple.  We found that most of the operators except the division(/) and remainder(rem) operators were synthesizable.  We synthesized our design using a  ASIC library at Intel Corporation with Synopsys Design Compiler and the DesignWare.  Note that the integer divider and remainder code was disabled during synthesis.

    The first step is to analyze and elaborate the design.  This will ensure that there are no errors in the vhdl and prepare an unoptimized netlist.

    dc_shell> define_design_lib work -path ../mra
    dc_shell> analyze -library work -format vhdl ../hdl/add_sub27_arch.vhd
    dc_shell> analyze -library work -format vhdl ../hdl/div_r2_arch.vhd
    dc_shell> analyze -library work -format vhdl ../hdl/mul_r2_arch.vhd
    dc_shell> analyze -library work -format vhdl ../hdl/except_arch.vhd
    dc_shell> analyze -library work -format vhdl ../hdl/pre_norm_fmul_arch.vhd
    dc_shell> analyze -library work -format vhdl ../hdl/pre_norm_arch.vhd
    dc_shell> analyze -library work -format vhdl ../hdl/post_norm_arch.vhd
    dc_shell> analyze -library work -format vhdl ../hdl/fpu_arch.vhd
    dc_shell> elaborate -library work fpu
    dc_shell> write -format db -hier -output fpu_pre.db

    Next we set the wire load model, operating conditions and define the clock.  Finally the model is compiled to the target library.
    The synthesis script is given here.

    Some of the questions we answered were.
    1. Upto what frequency can the fpu operate?
    The FPU can operate at a frequency of 100 MHz.

    2. What are number of library cells used.
    A total of 6871 cells  were used.

    3. What are the critical paths?
    The slowest timing paths are part of the post normalization unit, espically the generation of the underflow and overflow signals.

    Some of the reports about the synthesis are:

    1. fpu_check_design.rpt..> 08-Jul-2001 18:59 11k
    2. fpu_compile_scr.txt 08-Jul-2001 18:59 2k
    3. fpu_constraints_viol..> 08-Jul-2001 18:59 96k
    4. fpu_loop.rpt.txt 08-Jul-2001 18:59 1k
    5. fpu_timing.rpt.txt 08-Jul-2001 18:59 179k


    Conclusions We learned a lot in this project.  We learned VHDL and Verilog coding and syntax,  Floating Point Unit micro architecture, Floating Point Addition, Multiplication and Division algorithms, the IEEE standard for Binary Floating-Point Arithmetic, issues in design including pipelining, Verification Strategies and Synthesis.

    References

    1. Rudolf Usselman, Documentation for Floating Point Unit, http://www.opencores.org.
    2. John L. Hennessy and David A. Patterson, Computer Architecture A Quantitative Approach, 2nd Edition, Morgan Kaufmann, Appendix A.
    3. Peter J. Ashenden, The Designer's Guide to VHDL, Morgan Kaufmann.
    4. Donald E. Thomas and Philip R. Moorby, The Verilog Hardware Description Language, Kluwer Academic Publishers.
    5. Stuart Oberman, “Design Issues in High Performance Floating-Point Arithmetic Units”, Stanford University Technical report.
    6. IEEE, IEEE-754-1985 Standard for binary floating-point arithmetic.