--- /dev/null
+# IEEE 754 Floating Point ALU
+
+A core fundamental requirement of a GPU is to handle floating-point
+numbers. In a discrete GPU, compliance with standards is not
+strictly necessary: NVIDIA added 12-bit floating-point types for example,
+because it saves power and is accurate enough for certain computations.
+
+However, for a hybrid CPU / GPU, we definitely need to be standards
+compliant. IEEE754 is the industry-wide standard, so that's basically
+what we need. A quick google search reveals only a few options in
+the libre / open world:
+
+* Rudolf Usselman's [asics.ws](http://asics.ws)
+ [single-precision FPU](https://opencores.org/projects/fpu),
+ written in verilog
+* Jon Dawson's [single / double precision FPU](https://github.com/dawsonjon/fpu)
+ also written in verilog
+* John Hauser's [hardfloat](https://github.com/ucb-bar/berkeley-hardfloat/)
+ library, written in Chisel3.
+* The SiFive FPU (also written in Chisel3).
+
+Jon's library has over 100 million test vectors per function, which is
+extremely impressive. It also looks very clear and readable. However
+it does not look like it is pipelined, and we also need square-root.
+
+Rudi's library has the advantage that it is pipelined. However, Rudi
+is an extremely knowledgeable engineer, and the code that he has written
+looks to be heavily optimised, based on decades of experience. If we ever
+needed to adapt it, it would be extremely difficult to do so.
+
+John Hauser is well-known for his in-depth knowledge of IEEE754. He
+also wrote a softfloat library and also an extensive test suite
+(berkeley-testfloat-3). All his work can be found under the
+[ucb-bar](https://github.com/ucb-bar/) repositories.
+
+SiFive's work is well-known for being functionally correct yet designed
+in such a way that it is almost impossible to work with. Documentation
+and code comments are completely lacking, and an extremely high degree
+of abstraction is used (extensive undocumented use of object-orientated
+multiple inheritance) that makes maintenance a complete nightmare, creating
+a heavy reliance on people with significantly above-average intelligence,
+prodigious expertise, and decades of experience.
+In addition, chisel3 is converted to verilog that is almost impossible
+to read, as the conversion process goes through a state machine, rather
+than a language translation process.
+
+So we are slightly stuck for options. One of the key goals of this project
+is to create code that is long-term maintainable. That means clear
+explanations, plenty of links to resources, well-documented and well-commented
+code (but not completely overloaded with comments).
+
+Much as it pains me to have to say it, we need to start from scratch.
+
+This is not something that I like doing. I like to save time and effort
+by leveraging pre-existing resources. As I pointed out and
+[described on-list](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-February/000495.html),
+a huge amount of time and resources can be saved through a semi-automated
+conversion process of a pre-existing well-chosen library, such that risks
+of introducing errors and making design mistakes is minimised.
+However as Aleksander rightly points out, the fact that such an approach
+leaves little room for *understanding* of the resultant (translated)
+library, the semi-automated conversion route makes *collaborative*
+development unlikely to succeed. Think about it: if noone working on
+the conversion actually understands what it is that they are converting,
+and errors occur (unit tests do not pass), what do they do?
+
+(*A reflective moment here: again, for me, I am entering new territory,
+taking into account input from team members that contradicts past
+extensive experience. It's quite uncomfortable and exciting at the
+same time*)
+
+One thing that is clear: the means to use the Object-orientated features
+of python (through nmigen) will be critically important, here, to avoid
+significant code-proliferation. Jon's library for example duplicates
+the entirety of the codebase for single and double precision, and we
+will need *half* precision as well.
+
+This is one very important lesson to be learned from the extensive
+use of Chisel3 by SiFive: the FPU library is entirely abstracted
+such that the creation of Quad-precision and Half-precision may
+be done with only a few strategic lines of code instead of massive
+duplication of hundreds to thousands, with the obvious disadvantage
+of errors creeping in to the near-copies that get fixed in one but
+not the other.
+
+There is a lot to be done, here, mostly not in actual "work", but simply
+in *understanding* how to go about doing IEEE754 arithmetic. Rounding,
+overflow, tininess, packing, normalisation, let alone optimisation: it's
+a really big job. We'll just have to see how it goes.
+