clarification
[crowdsupply.git] / updates / 016_2019feb17_fpu_dev_process.mdwn
1 # FPU Development Progress
2
3 # Development Practices
4
5 Whenever I see people working with IDEs where the editor is operated full-screen
6 and yet the middle and right hand side is entirely devoid of text, I cringe.
7 I recently had to endure abuse and derision from an unethically-operated
8 company for suggesting full compliance with pep8 (pep8 requires maximum
9 80 characters per line). Yet at this same company, I was operating at a
10 commit rate of over 500 commits per month. This exceeded the commit
11 rate of the entire company of over 30 engineers.
12
13 Which begs the obvious question: how on earth am I able to sustain such
14 a rapid development rate, and more to the point, why isn't anyone else?
15 A key difference is that, firstly, I flatly refuse to use graphical IDEs.
16 There's nothing that they provide which is of benefit to rapid development
17 that cannot be done faster with command-line tools and an efficient Desktop
18 layout: more than that, the time required to move a hand off the keyboard
19 and onto a mouse, then to locate the cursor, and then move the cursor,
20 and then click the mouse: all of that is time wasted.
21
22 {fpu-dev-screenshot.png}
23
24 [This post](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-February/000567.html)
25 explains further, that it is essential to get as much information on-screen
26 as can possibly be managed with the computing resources available.
27 *Screen real-estate is king*. We talk in computing about Virtual Memory
28 "working set", which is the set of memory pages that need to be in physical
29 memory to avoid "thrashing" of swap-space; the **exact** same concept
30 applies to editing and development of source code and the online
31 research into APIs.
32
33 Below is a video which gives some insights into the use of this development
34 methodology, as well as giving a walk-through of the nmigen conversion
35 process of Jon Dawson's excellent verilog IEEE754 FPU.
36
37 <https://www.youtube.com/watch?v=A1-gWthveRI>
38
39 # Conversion of Jon Dawson's IEEE754 FPU to nmigen
40
41 The [initial conversion process](https://git.libre-riscv.org/?p=ieee754fpu.git;a=commit;h=d26d9dd46e9fd22a1f89357a6fbcecf0eb723f44)
42 has been extremely rapid. With Aleksander's help, the 32-bit adder was
43 working within around 2-3 days. Div quickly followed: next was conversion
44 to 64-bit. Multiply was then added, and Jon Dawson's unit tests adapted
45 and run, tens of thousands of unit tests passed and found several errors.
46 Interestingly, some of them were found to be from the *original* code.
47 This because John Hauser's softfloat-3 library was used, which is more
48 recent than Jon Dawson's work.
49
50 As this is a general-purpose library, an announcement was made
51 and a discussion on
52 [librecores](https://lists.librecores.org/pipermail/discussion/2019-February/000687.html)
53 followed with some interesting questions: what's the gate latency (cycle
54 time). This question is still being evaluated.
55
56 As the GPU is going to have FP16 support added, FP16 was added to the IEEE754
57 unit with only a few actual lines of code, specifying the size of the
58 mantissa and exponent in *one base class* alone. When it came to adding
59 unit tests, it was quite straightforward to adopt Jon Hauser's unit test
60 code from FP32, however we encountered
61 [an anomaly](https://groups.google.com/a/groups.riscv.org/forum/#!topic/sw-dev/JuRuL5HEIPM).
62 When using sfpy (Berkeley softfloat-3 python bindings), adding zero or minus
63 zero to a non-canonical "NaN" resulted in the most weird responses from the
64 Softfloat Library. We're still tracking this down.
65
66 # Converting to a pipeline
67
68 This is where we are currently stumped, and lack of experience with nmigen
69 is showing through. The desired outcome is to adapt the code, which is
70 a state machine, so that it can be pipelined. However, as this is a
71 general-purpose library, and, also, for certain engines (particularly
72 DIV or SQRT), we would like to keep it as a state machine, the idea is
73 to create "Mixin" base classes that can make use of all the various
74 stages, creating a state machine where needed or a pipeline where needed,
75 without requiring maintenance of two near-identical codebases.
76
77 Doing so is proving somewhat irksome, and efforts are beginning to bury
78 actual hardware logic under mounds of abstraction. Each stage needs
79 to have separate inputs and outputs, and for them to be joined together.
80 Several months ago a
81 [pipeline class](https://git.libre-riscv.org/?p=ieee754fpu.git;a=blob;f=src/add/pipeline_example.py;h=544b745b0a5d7b710b7d9eea38397acab5f4799a;hb=d26d9dd46e9fd22a1f89357a6fbcecf0eb723f44)
82 was identified, from PyRTL, and adapted to nmigen. It works by overloading
83 python getattr and setattr in a class, which then auto-creates member instance
84 signals with the appropriate names, "variable-n-stage-1" where the variable
85 "n" happens to have been made use of in stage 2 from stage 1. All of that
86 is hidden from the developer, leaving some extremely clear and obvious
87 code.
88
89 The problem is that there doesn't exist a state-based version of the
90 same class, and, in addition, we are using classes *containing* Signals,
91 and the way to adapt this code is not clear.
92
93 In essence we are running into a "two unknowns" scenario. Unfamiliarity
94 with nmigen *and* how to adapt the code, keeping it running at each
95 stage so that the unit tests always pass, is hampering decision-making.
96 A lot more thought is required.