# IEEE 754 Floating Point ALU A core fundamental requirement of a GPU is to handle floating-point numbers. In a discrete GPU, compliance with standards is not strictly necessary: NVIDIA added 12-bit floating-point types for example, because it saves power and is accurate enough for certain computations. However, for a hybrid CPU / GPU, we definitely need to be standards compliant. IEEE754 is the industry-wide standard, so that's basically what we need. A quick google search reveals only a few options in the libre / open world: * Rudolf Usselman's [asics.ws](http://asics.ws) [single-precision FPU](https://opencores.org/projects/fpu), written in verilog * Jon Dawson's [single / double precision FPU](https://github.com/dawsonjon/fpu) also written in verilog * John Hauser's [hardfloat](https://github.com/ucb-bar/berkeley-hardfloat/) library, written in Chisel3. * The SiFive FPU (also written in Chisel3). Jon's library has over 100 million test vectors per function, which is extremely impressive. It also looks very clear and readable. However it does not look like it is pipelined, and we also need square-root. Rudi's library has the advantage that it is pipelined. However, Rudi is an extremely knowledgeable engineer, and the code that he has written looks to be heavily optimised, based on decades of experience. If we ever needed to adapt it, it would be extremely difficult to do so. John Hauser is well-known for his in-depth knowledge of IEEE754. He also wrote a softfloat library and also an extensive test suite (berkeley-testfloat-3). All his work can be found under the [ucb-bar](https://github.com/ucb-bar/) repositories. SiFive's work is well-known for being functionally correct yet designed in such a way that it is almost impossible to work with. Documentation and code comments are completely lacking, and an extremely high degree of abstraction is used (extensive undocumented use of object-orientated multiple inheritance) that makes maintenance a complete nightmare, creating a heavy reliance on people with significantly above-average intelligence, prodigious expertise, and decades of experience. In addition, chisel3 is converted to verilog that is almost impossible to read, as the conversion process goes through a state machine, rather than a language translation process. So we are slightly stuck for options. One of the key goals of this project is to create code that is long-term maintainable. That means clear explanations, plenty of links to resources, well-documented and well-commented code (but not completely overloaded with comments). Much as it pains me to have to say it, we need to start from scratch. This is not something that I like doing. I like to save time and effort by leveraging pre-existing resources. As I pointed out and [described on-list](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-February/000495.html), a huge amount of time and resources can be saved through a semi-automated conversion process of a pre-existing well-chosen library, such that risks of introducing errors and making design mistakes is minimised. However as Aleksander rightly points out, the fact that such an approach leaves little room for *understanding* of the resultant (translated) library, the semi-automated conversion route makes *collaborative* development unlikely to succeed. Think about it: if noone working on the conversion actually understands what it is that they are converting, and errors occur (unit tests do not pass), what do they do? (*A reflective moment here: again, for me, I am entering new territory, taking into account input from team members that contradicts past extensive experience. It's quite uncomfortable and exciting at the same time*) One thing that is clear: the means to use the Object-orientated features of python (through nmigen) will be critically important, here, to avoid significant code-proliferation. Jon's library for example duplicates the entirety of the codebase for single and double precision, and we will need *half* precision as well. This is one very important lesson to be learned from the extensive use of Chisel3 by SiFive: the FPU library is entirely abstracted such that the creation of Quad-precision and Half-precision may be done with only a few strategic lines of code instead of massive duplication of hundreds to thousands, with the obvious disadvantage of errors creeping in to the near-copies that get fixed in one but not the other. There is a lot to be done, here, mostly not in actual "work", but simply in *understanding* how to go about doing IEEE754 arithmetic. Rounding, overflow, tininess, packing, normalisation, let alone optimisation: it's a really big job: we'll just have to see how it goes, and in the meantime, read up on resources such as this [excellent guide](https://steve.hollasch.net/cgindex/coding/ieeefloat.html) by Steve Hollasch. This is basically a background update, explaining the rationale and direction that's been chosen. Future updates on the FPU will be much more technical.