# IEEE 754 Floating Point ALU

A core fundamental requirement of a GPU is to handle floating-point
numbers.  In a discrete GPU, compliance with standards is not
strictly necessary: NVIDIA added 12-bit floating-point types for example,
because it saves power and is accurate enough for certain computations.

However, for a hybrid CPU / GPU, we definitely need to be standards
compliant.  IEEE754 is the industry-wide standard, so that's basically
what we need.  A quick google search reveals only a few options in
the libre / open world: 

* Rudolf Usselman's [asics.ws](http://asics.ws)
  [single-precision FPU](https://opencores.org/projects/fpu),
  written in verilog
* Jon Dawson's [single / double precision FPU](https://github.com/dawsonjon/fpu)
  also written in verilog
* John Hauser's [hardfloat](https://github.com/ucb-bar/berkeley-hardfloat/)
  library, written in Chisel3.
* The SiFive FPU (also written in Chisel3).

Jon's library has over 100 million test vectors per function, which is
extremely impressive.  It also looks very clear and readable.  However
it does not look like it is pipelined, and we also need square-root.

Rudi's library has the advantage that it is pipelined.  However, Rudi
is an extremely knowledgeable engineer, and the code that he has written
looks to be heavily optimised, based on decades of experience.  If we ever
needed to adapt it, it would be extremely difficult to do so.

John Hauser is well-known for his in-depth knowledge of IEEE754.  He
also wrote a softfloat library and also an extensive test suite
(berkeley-testfloat-3).  All his work can be found under the
[ucb-bar](https://github.com/ucb-bar/) repositories.

SiFive's work is well-known for being functionally correct yet designed
in such a way that it is almost impossible to work with.  Documentation
and code comments are completely lacking, and an extremely high degree
of abstraction is used (extensive undocumented use of object-orientated
multiple inheritance) that makes maintenance a complete nightmare, creating
a heavy reliance on people with significantly above-average intelligence,
prodigious expertise, and decades of experience.
In addition, chisel3 is converted to verilog that is almost impossible
to read, as the conversion process goes through a state machine, rather
than a language translation process.

So we are slightly stuck for options.  One of the key goals of this project
is to create code that is long-term maintainable.  That means clear
explanations, plenty of links to resources, well-documented and well-commented
code (but not completely overloaded with comments).

Much as it pains me to have to say it, we need to start from scratch.

This is not something that I like doing.  I like to save time and effort
by leveraging pre-existing resources.  As I pointed out and
[described on-list](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-February/000495.html),
a huge amount of time and resources can be saved through a semi-automated
conversion process of a pre-existing well-chosen library, such that risks
of introducing errors and making design mistakes is minimised.
However as Aleksander rightly points out, the fact that such an approach
leaves little room for *understanding* of the resultant (translated)
library, the semi-automated conversion route makes *collaborative*
development unlikely to succeed.  Think about it: if noone working on
the conversion actually understands what it is that they are converting,
and errors occur (unit tests do not pass), what do they do?

(*A reflective moment here: again, for me, I am entering new territory,
taking into account input from team members that contradicts past
extensive experience.  It's quite uncomfortable and exciting at the
same time*)

One thing that is clear: the means to use the Object-orientated features
of python (through nmigen) will be critically important, here, to avoid
significant code-proliferation.  Jon's library for example duplicates
the entirety of the codebase for single and double precision, and we
will need *half* precision as well.

This is one very important lesson to be learned from the extensive
use of Chisel3 by SiFive: the FPU library is entirely abstracted
such that the creation of Quad-precision and Half-precision may
be done with only a few strategic lines of code instead of massive
duplication of hundreds to thousands, with the obvious disadvantage
of errors creeping in to the near-copies that get fixed in one but
not the other.

There is a lot to be done, here, mostly not in actual "work", but simply
in *understanding* how to go about doing IEEE754 arithmetic.  Rounding,
overflow, tininess, packing, normalisation, let alone optimisation: it's
a really big job: we'll just have to see how it goes, and in the
meantime, read up on resources such as this
[excellent guide](https://steve.hollasch.net/cgindex/coding/ieeefloat.html)
by Steve Hollasch.

This is basically a background update, explaining the rationale and
direction that's been chosen.  Future updates on the FPU will be much
more technical.