updates/016_2019feb17_fpu_dev_process.mdwn

   1 # FPU Development Progress
   2
   3 # Development Practices
   4
   5 Whenever I see people working with IDEs where the editor is operated full-screen
   6 and yet the middle and right hand side is entirely devoid of text, I cringe.
   7 I recently had to endure abuse and derision from an unethically-operated
   8 company for suggesting full compliance with pep8 (pep8 requires maximum
   9 80 characters per line).  Yet at this same company, I was operating at a
  10 commit rate of over 500 commits per month.  This exceeded the commit
  11 rate of the entire company of over 30 engineers.
  12
  13 Which begs the obvious question: how on earth am I able to sustain such
  14 a rapid development rate, and more to the point, why isn't anyone else?
  15 A key difference is that, firstly, I flatly refuse to use graphical IDEs.
  16 There's nothing that they provide which is of benefit to rapid development
  17 that cannot be done faster with command-line tools and an efficient Desktop
  18 layout: more than that, the time required to move a hand off the keyboard
  19 and onto a mouse, then to locate the cursor, and then move the cursor,
  20 and then click the mouse: all of that is time wasted.
  21
  22 {fpu-dev-screenshot.png}
  23
  24 [This post](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-February/000567.html)
  25 explains further, that it is essential to get as much information on-screen
  26 as can possibly be managed with the computing resources available.
  27 *Screen real-estate is king*.  We talk in computing about Virtual Memory
  28 "working set", which is the set of memory pages that need to be in physical
  29 memory to avoid "thrashing" of swap-space; the **exact** same concept
  30 applies to editing and development of source code and the online
  31 research into APIs.
  32
  33 Below is a video which gives some insights into the use of this development
  34 methodology, as well as giving a walk-through of the nmigen conversion
  35 process of Jon Dawson's excellent verilog IEEE754 FPU.
  36
  37 <https://www.youtube.com/watch?v=A1-gWthveRI>
  38
  39 # Conversion of Jon Dawson's IEEE754 FPU to nmigen
  40
  41 The [initial conversion process](https://git.libre-riscv.org/?p=ieee754fpu.git;a=commit;h=d26d9dd46e9fd22a1f89357a6fbcecf0eb723f44)
  42 has been extremely rapid.  With Aleksander's help, the 32-bit adder was
  43 working within around 2-3 days.  Div quickly followed: next was conversion
  44 to 64-bit.  Multiply was then added, and Jon Dawson's unit tests adapted
  45 and run, tens of thousands of unit tests passed and found several errors.
  46 Interestingly, some of them were found to be from the *original* code.
  47 This because John Hauser's softfloat-3 library was used, which is more
  48 recent than Jon Dawson's work.
  49
  50 As this is a general-purpose library, an announcement was made
  51 and a discussion on
  52 [librecores](https://lists.librecores.org/pipermail/discussion/2019-February/000687.html)
  53 followed with some interesting questions: what's the gate latency (cycle
  54 time).  This question is still being evaluated.
  55
  56 As the GPU is going to have FP16 support added, FP16 was added to the IEEE754
  57 unit with only a few actual lines of code, specifying the size of the
  58 mantissa and exponent in *one base class* alone.  When it came to adding
  59 unit tests, it was quite straightforward to adopt Jon Hauser's unit test
  60 code from FP32, however we encountered
  61 [an anomaly](https://groups.google.com/a/groups.riscv.org/forum/#!topic/sw-dev/JuRuL5HEIPM).
  62 When using sfpy (Berkeley softfloat-3 python bindings), adding zero or minus
  63 zero to a non-canonical "NaN" resulted in the most weird responses from the
  64 Softfloat Library.  We're still tracking this down.
  65
  66 # Converting to a pipeline
  67
  68 This is where we are currently stumped, and lack of experience with nmigen
  69 is showing through.  The desired outcome is to adapt the code, which is
  70 a state machine, so that it can be pipelined.  However, as this is a
  71 general-purpose library, and, also, for certain engines (particularly
  72 DIV or SQRT), we would like to keep it as a state machine, the idea is
  73 to create "Mixin" base classes that can make use of all the various
  74 stages, creating a state machine where needed or a pipeline where needed,
  75 without requiring maintenance of two near-identical codebases.
  76
  77 Doing so is proving somewhat irksome, and efforts are beginning to bury
  78 actual hardware logic under mounds of abstraction.  Each stage needs
  79 to have separate inputs and outputs, and for them to be joined together.
  80 Several months ago a
  81 [pipeline class](https://git.libre-riscv.org/?p=ieee754fpu.git;a=blob;f=src/add/pipeline_example.py;h=544b745b0a5d7b710b7d9eea38397acab5f4799a;hb=d26d9dd46e9fd22a1f89357a6fbcecf0eb723f44)
  82 was identified, from PyRTL, and adapted to nmigen.  It works by overloading
  83 python getattr and setattr in a class, which then auto-creates member instance
  84 signals with the appropriate names, "variable-n-stage-1" where the variable
  85 "n" happens to have been made use of in stage 2 from stage 1.  All of that
  86 is hidden from the developer, leaving some extremely clear and obvious
  87 code.
  88
  89 The problem is that there doesn't exist a state-based version of the
  90 same class, and, in addition, we are using classes *containing* Signals,
  91 and the way to adapt this code is not clear.
  92
  93 In essence we are running into a "two unknowns" scenario.  Unfamiliarity
  94 with nmigen *and* how to adapt the code, keeping it running at each
  95 stage so that the unit tests always pass, is hampering decision-making.
  96 A lot more thought is required.