# FPU Development Progress # Development Practices Whenever I see people working with IDEs where the editor is operated full-screen and yet the middle and right hand side is entirely devoid of text, I cringe. I recently had to endure abuse and derision from an unethically-operated company for suggesting full compliance with pep8 (pep8 requires maximum 80 characters per line). Yet at this same company, I was operating at a commit rate of over 500 commits per month. This exceeded the commit rate of the entire company of over 30 engineers. Which begs the obvious question: how on earth am I able to sustain such a rapid development rate, and more to the point, why isn't anyone else? A key difference is that, firstly, I flatly refuse to use graphical IDEs. There's nothing that they provide which is of benefit to rapid development that cannot be done faster with command-line tools and an efficient Desktop layout: more than that, the time required to move a hand off the keyboard and onto a mouse, then to locate the cursor, and then move the cursor, and then click the mouse: all of that is time wasted. {fpu-dev-screenshot.png} [This post](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-February/000567.html) explains further, that it is essential to get as much information on-screen as can possibly be managed with the computing resources available. *Screen real-estate is king*. We talk in computing about Virtual Memory "working set", which is the set of memory pages that need to be in physical memory to avoid "thrashing" of swap-space; the **exact** same concept applies to editing and development of source code and the online research into APIs. Below is a video which gives some insights into the use of this development methodology, as well as giving a walk-through of the nmigen conversion process of Jon Dawson's excellent verilog IEEE754 FPU. # Conversion of Jon Dawson's IEEE754 FPU to nmigen The [initial conversion process](https://git.libre-riscv.org/?p=ieee754fpu.git;a=commit;h=d26d9dd46e9fd22a1f89357a6fbcecf0eb723f44) has been extremely rapid. With Aleksander's help, the 32-bit adder was working within around 2-3 days. Div quickly followed: next was conversion to 64-bit. Multiply was then added, and Jon Dawson's unit tests adapted and run, tens of thousands of unit tests passed and found several errors. Interestingly, some of them were found to be from the *original* code. This because John Hauser's softfloat-3 library was used, which is more recent than Jon Dawson's work. As this is a general-purpose library, an announcement was made and a discussion on [librecores](https://lists.librecores.org/pipermail/discussion/2019-February/000687.html) followed with some interesting questions: what's the gate latency (cycle time). This question is still being evaluated. As the GPU is going to have FP16 support added, FP16 was added to the IEEE754 unit with only a few actual lines of code, specifying the size of the mantissa and exponent in *one base class* alone. When it came to adding unit tests, it was quite straightforward to adopt Jon Hauser's unit test code from FP32, however we encountered [an anomaly](https://groups.google.com/a/groups.riscv.org/forum/#!topic/sw-dev/JuRuL5HEIPM). When using sfpy (Berkeley softfloat-3 python bindings), adding zero or minus zero to a non-canonical "NaN" resulted in the most weird responses from the Softfloat Library. We're still tracking this down. # Converting to a pipeline This is where we are currently stumped, and lack of experience with nmigen is showing through. The desired outcome is to adapt the code, which is a state machine, so that it can be pipelined. However, as this is a general-purpose library, and, also, for certain engines (particularly DIV or SQRT), we would like to keep it as a state machine, the idea is to create "Mixin" base classes that can make use of all the various stages, creating a state machine where needed or a pipeline where needed, without requiring maintenance of two near-identical codebases. Doing so is proving somewhat irksome, and efforts are beginning to bury actual hardware logic under mounds of abstraction. Each stage needs to have separate inputs and outputs, and for them to be joined together. Several months ago a [pipeline class](https://git.libre-riscv.org/?p=ieee754fpu.git;a=blob;f=src/add/pipeline_example.py;h=544b745b0a5d7b710b7d9eea38397acab5f4799a;hb=d26d9dd46e9fd22a1f89357a6fbcecf0eb723f44) was identified, from PyRTL, and adapted to nmigen. It works by overloading python getattr and setattr in a class, which then auto-creates member instance signals with the appropriate names, "variable-n-stage-1" where the variable "n" happens to have been made use of in stage 2 from stage 1. All of that is hidden from the developer, leaving some extremely clear and obvious code. The problem is that there doesn't exist a state-based version of the same class, and, in addition, we are using classes *containing* Signals, and the way to adapt this code is not clear. In essence we are running into a "two unknowns" scenario. Unfamiliarity with nmigen *and* how to adapt the code, keeping it running at each stage so that the unit tests always pass, is hampering decision-making. A lot more thought is required.