# FPU Development Progress

# Development Practices

Whenever I see people working with IDEs where the editor is operated full-screen
and yet the middle and right hand side is entirely devoid of text, I cringe.
I recently had to endure abuse and derision from an unethically-operated
company for suggesting full compliance with pep8 (pep8 requires maximum
80 characters per line).  Yet at this same company, I was operating at a
commit rate of over 500 commits per month.  This exceeded the commit
rate of the entire company of over 30 engineers.

Which begs the obvious question: how on earth am I able to sustain such
a rapid development rate, and more to the point, why isn't anyone else?
A key difference is that, firstly, I flatly refuse to use graphical IDEs.
There's nothing that they provide which is of benefit to rapid development
that cannot be done faster with command-line tools and an efficient Desktop
layout: more than that, the time required to move a hand off the keyboard
and onto a mouse, then to locate the cursor, and then move the cursor,
and then click the mouse: all of that is time wasted.

{fpu-dev-screenshot.png}

[This post](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-February/000567.html)
explains further, that it is essential to get as much information on-screen
as can possibly be managed with the computing resources available.
*Screen real-estate is king*.  We talk in computing about Virtual Memory
"working set", which is the set of memory pages that need to be in physical
memory to avoid "thrashing" of swap-space; the **exact** same concept
applies to editing and development of source code and the online
research into APIs.

Below is a video which gives some insights into the use of this development
methodology, as well as giving a walk-through of the nmigen conversion
process of Jon Dawson's excellent verilog IEEE754 FPU.

<https://www.youtube.com/watch?v=A1-gWthveRI>

# Conversion of Jon Dawson's IEEE754 FPU to nmigen

The [initial conversion process](https://git.libre-riscv.org/?p=ieee754fpu.git;a=commit;h=d26d9dd46e9fd22a1f89357a6fbcecf0eb723f44)
has been extremely rapid.  With Aleksander's help, the 32-bit adder was
working within around 2-3 days.  Div quickly followed: next was conversion
to 64-bit.  Multiply was then added, and Jon Dawson's unit tests adapted
and run, tens of thousands of unit tests passed and found several errors.
Interestingly, some of them were found to be from the *original* code.
This because John Hauser's softfloat-3 library was used, which is more
recent than Jon Dawson's work.

As this is a general-purpose library, an announcement was made
and a discussion on
[librecores](https://lists.librecores.org/pipermail/discussion/2019-February/000687.html)
followed with some interesting questions: what's the gate latency (cycle
time).  This question is still being evaluated.

As the GPU is going to have FP16 support added, FP16 was added to the IEEE754
unit with only a few actual lines of code, specifying the size of the
mantissa and exponent in *one base class* alone.  When it came to adding
unit tests, it was quite straightforward to adopt Jon Hauser's unit test
code from FP32, however we encountered
[an anomaly](https://groups.google.com/a/groups.riscv.org/forum/#!topic/sw-dev/JuRuL5HEIPM).
When using sfpy (Berkeley softfloat-3 python bindings), adding zero or minus
zero to a non-canonical "NaN" resulted in the most weird responses from the
Softfloat Library.  We're still tracking this down.

# Converting to a pipeline

This is where we are currently stumped, and lack of experience with nmigen
is showing through.  The desired outcome is to adapt the code, which is
a state machine, so that it can be pipelined.  However, as this is a
general-purpose library, and, also, for certain engines (particularly
DIV or SQRT), we would like to keep it as a state machine, the idea is
to create "Mixin" base classes that can make use of all the various
stages, creating a state machine where needed or a pipeline where needed,
without requiring maintenance of two near-identical codebases.

Doing so is proving somewhat irksome, and efforts are beginning to bury
actual hardware logic under mounds of abstraction.  Each stage needs
to have separate inputs and outputs, and for them to be joined together.
Several months ago a
[pipeline class](https://git.libre-riscv.org/?p=ieee754fpu.git;a=blob;f=src/add/pipeline_example.py;h=544b745b0a5d7b710b7d9eea38397acab5f4799a;hb=d26d9dd46e9fd22a1f89357a6fbcecf0eb723f44)
was identified, from PyRTL, and adapted to nmigen.  It works by overloading
python getattr and setattr in a class, which then auto-creates member instance
signals with the appropriate names, "variable-n-stage-1" where the variable
"n" happens to have been made use of in stage 2 from stage 1.  All of that
is hidden from the developer, leaving some extremely clear and obvious
code.

The problem is that there doesn't exist a state-based version of the
same class, and, in addition, we are using classes *containing* Signals,
and the way to adapt this code is not clear.

In essence we are running into a "two unknowns" scenario.  Unfamiliarity
with nmigen *and* how to adapt the code, keeping it running at each
stage so that the unit tests always pass, is hampering decision-making.
A lot more thought is required.