From 2d9e554fea3d3329832d80c46e0a4a594257d1ac Mon Sep 17 00:00:00 2001 From: Luke Kenneth Casson Leighton Date: Tue, 12 Feb 2019 06:05:10 +0000 Subject: [PATCH] add FPU intro --- updates/014_2019feb12_floating_point.mdwn | 90 +++++++++++++++++++++++ 1 file changed, 90 insertions(+) create mode 100644 updates/014_2019feb12_floating_point.mdwn diff --git a/updates/014_2019feb12_floating_point.mdwn b/updates/014_2019feb12_floating_point.mdwn new file mode 100644 index 0000000..21c7b6c --- /dev/null +++ b/updates/014_2019feb12_floating_point.mdwn @@ -0,0 +1,90 @@ +# IEEE 754 Floating Point ALU + +A core fundamental requirement of a GPU is to handle floating-point +numbers. In a discrete GPU, compliance with standards is not +strictly necessary: NVIDIA added 12-bit floating-point types for example, +because it saves power and is accurate enough for certain computations. + +However, for a hybrid CPU / GPU, we definitely need to be standards +compliant. IEEE754 is the industry-wide standard, so that's basically +what we need. A quick google search reveals only a few options in +the libre / open world: + +* Rudolf Usselman's [asics.ws](http://asics.ws) + [single-precision FPU](https://opencores.org/projects/fpu), + written in verilog +* Jon Dawson's [single / double precision FPU](https://github.com/dawsonjon/fpu) + also written in verilog +* John Hauser's [hardfloat](https://github.com/ucb-bar/berkeley-hardfloat/) + library, written in Chisel3. +* The SiFive FPU (also written in Chisel3). + +Jon's library has over 100 million test vectors per function, which is +extremely impressive. It also looks very clear and readable. However +it does not look like it is pipelined, and we also need square-root. + +Rudi's library has the advantage that it is pipelined. However, Rudi +is an extremely knowledgeable engineer, and the code that he has written +looks to be heavily optimised, based on decades of experience. If we ever +needed to adapt it, it would be extremely difficult to do so. + +John Hauser is well-known for his in-depth knowledge of IEEE754. He +also wrote a softfloat library and also an extensive test suite +(berkeley-testfloat-3). All his work can be found under the +[ucb-bar](https://github.com/ucb-bar/) repositories. + +SiFive's work is well-known for being functionally correct yet designed +in such a way that it is almost impossible to work with. Documentation +and code comments are completely lacking, and an extremely high degree +of abstraction is used (extensive undocumented use of object-orientated +multiple inheritance) that makes maintenance a complete nightmare, creating +a heavy reliance on people with significantly above-average intelligence, +prodigious expertise, and decades of experience. +In addition, chisel3 is converted to verilog that is almost impossible +to read, as the conversion process goes through a state machine, rather +than a language translation process. + +So we are slightly stuck for options. One of the key goals of this project +is to create code that is long-term maintainable. That means clear +explanations, plenty of links to resources, well-documented and well-commented +code (but not completely overloaded with comments). + +Much as it pains me to have to say it, we need to start from scratch. + +This is not something that I like doing. I like to save time and effort +by leveraging pre-existing resources. As I pointed out and +[described on-list](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-February/000495.html), +a huge amount of time and resources can be saved through a semi-automated +conversion process of a pre-existing well-chosen library, such that risks +of introducing errors and making design mistakes is minimised. +However as Aleksander rightly points out, the fact that such an approach +leaves little room for *understanding* of the resultant (translated) +library, the semi-automated conversion route makes *collaborative* +development unlikely to succeed. Think about it: if noone working on +the conversion actually understands what it is that they are converting, +and errors occur (unit tests do not pass), what do they do? + +(*A reflective moment here: again, for me, I am entering new territory, +taking into account input from team members that contradicts past +extensive experience. It's quite uncomfortable and exciting at the +same time*) + +One thing that is clear: the means to use the Object-orientated features +of python (through nmigen) will be critically important, here, to avoid +significant code-proliferation. Jon's library for example duplicates +the entirety of the codebase for single and double precision, and we +will need *half* precision as well. + +This is one very important lesson to be learned from the extensive +use of Chisel3 by SiFive: the FPU library is entirely abstracted +such that the creation of Quad-precision and Half-precision may +be done with only a few strategic lines of code instead of massive +duplication of hundreds to thousands, with the obvious disadvantage +of errors creeping in to the near-copies that get fixed in one but +not the other. + +There is a lot to be done, here, mostly not in actual "work", but simply +in *understanding* how to go about doing IEEE754 arithmetic. Rounding, +overflow, tininess, packing, normalisation, let alone optimisation: it's +a really big job. We'll just have to see how it goes. + -- 2.30.2