From: Luke Kenneth Casson Leighton Date: Fri, 15 Feb 2019 06:51:06 +0000 (+0000) Subject: add summary update X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=51f0f8b8051a7bde3c94efa0ab2f9936a5c5f9ae;p=crowdsupply.git add summary update --- diff --git a/updates/015_2019feb15_summary.mdwn b/updates/015_2019feb15_summary.mdwn new file mode 100644 index 0000000..6f85730 --- /dev/null +++ b/updates/015_2019feb15_summary.mdwn @@ -0,0 +1,108 @@ +# IEEE754 Floating-Point, Virtual Memory, SimpleV Prefixing + +This update covers three different topics, as we now have four people +working on different areas. Daniel is designing a Virtual Memory TLB; +Aleksander and I are working on an IEEE754 FPU, and Jacob has been +designing a 48-bit parallel-prefixed RISC-V ISA extension. + +# IEEE754 FPU + +Prior to Aleksander joining the team, we analysed +[nmigen](https://github.com/m-labs/nmigen) by taking an existing +verilog project (Jacob's rv32 processor) and converting it. +To give Aleksander a way to bootstrap up to understanding nmigen +as well as IEEE754, I decided to do a +[similar conversion](https://git.libre-riscv.org/?p=ieee754fpu.git;a=tree;f=src/add) +on Jon Dawson's +[adder.v](https://github.com/dawsonjon/fpu/blob/master/adder/adder.v). +It turned out +[pretty good](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-February/000551.html). + +We added a lot of comments so that each stage of the FP add is easy to +understand. A python class - FPNum - was created that abstracted out +a significant amount of repetitive verilog, particularly when it came +to packing and unpacking the result. Common patterns for recognising +(or creating) +/- INF or NaN were given a function with an easily-recogniseable +name in the FPNum class that return nmigen code-fragments: a technique +that is flat-out impossible to achieve with any other type of HDL programming. + +Already we have identified that the majority of code between Jon's 32-bit +and 64-bit FPU is near-identical, the only difference being a set of +constants defining the width of the mantissa and exponent, and more than +that, we've also identified that around 80 to 90% of the code is duplicated +between adder, divider and multiplier. In particular, the normalisation +and denormalisation, as well as the packing and unpacking stages: they're all +absolutely identical. Consequently we can abstract these stages out into +base classes. + +# Virtual Memory / TLB + +A [TLB](https://en.wikipedia.org/wiki/Translation_lookaside_buffer) +is a Translation Lookaside Buffer. It's the fundamental basis of +Virtual Memory. We're not doing an Embedded Controller, here, so +Virtual Memory is essential. Daniel has been +[researching](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-February/000490.html) +this and found an extremely useful paper that explains that standard +CPU Virtual Memory strategies typically fail extremely badly when naively +transferred over to GPUs. + +The reason for the failure is explained in the paper: GPU workloads typically +involve considerable amounts of parallel data that is processed once +*and only* once. Most standard scalar CPU Virtual Memory strategies are +based on the assumption that areas of memory (code and data) will be +accessed several times in quick succession. + +We will therefore need to put a *lot* of thought into what we are going +to do, here. A good lead to follow is the hint that if one GPU thread needs +a region of data, then because the workload is parallel it is extremely +likely that there will be nearby data that could potentially be loaded +in advance. + +This may be an area where a software-based TLB may have an advantage over +a hardware one: we can deploy different strategies, even after the hardware +is finalised. Again, we just have to see how it goes, here. + +# Simple-V Prefix Proposal (v2) + +In looking at the SPIR-V to LLVM-IR compiler, Jacob identified that LLVM's +IR backend really does not have the ability to support what is effectively +stack-based wholesale changes to the meaning of instructions. In addition, +the setup time of the SimpleV extension (the number of instructions required +to set up the changed meaning of instructions) is significant. + +The "Prefix" Proposal therefore compromises by introducing a 48-bit +(and also a 32-bit "Compressed") parallel instruction format. We spent +considerable time going over the options, and the result is a +[second proposed prefix scheme](https://salsa.debian.org/Kazan-team/kazan/blob/master/docs/SVprefix%20Proposal.rst). + +What's really nice is that Simon Moll, the author of an LLVM Vector Predication +proposal, is +[ taking SimpleV into consideration](https://lists.llvm.org/pipermail/llvm-dev/2019-February/129973.html). +A GPU (and the Vulkan API) contains a considerable number of 3-long and +4-long Floating Point Vectors. However these are processed in *parallel*, +so there are *multiple* 3-long and 4-long vectors. It makes no sense to +have predication bits down to the granularity of individual elements *in* +the vectors, so we need a vector mask that allocates one bit per 3-long +or 4-long "group". Or, more to the point: there is a performance penalty +associated with having to allocate mask bits right down to the level of +the individual elements. So it is really nice that Simon is taking this +into consideration. + +# Summary + +So there is quite a lot going on, and we're making progress. There are +a lot of unknowns: that's okay. It's a complex project. The main thing +is to just keep going. Sadly, a significant number of reddit and other +forums are full of people saying things like "this project could not +possibly succeed, because they don't have the billion dollar resources +of e.g. Intel". So... um, should we stop? Should we just quit, then, and +not even try? + +That's the thing when something has never been done before: you simply +cannot say "It Won't Possibly Work", because that's just not how innovation +works. You don't *know* if it will or won't work, and you simply do not +have enough information to categorically say, one way or the other, until +you try. + +In short: failure or success are *both* their own self-fulfilling prophecies. +