X-Git-Url: https://git.libre-soc.org/?a=blobdiff_plain;f=updates%2F023_2020mar26_decoder_emulator_started.mdwn;h=83a282da6d118e2342ac25f84b982d5618807b44;hb=3381351a97f89968f6af01fc54cd1fa1851f24cc;hp=502882cb896a2b9e08ee3c24373fa1307139021b;hpb=3d9e21f6d618b91f53edc2a7d2b7320e13135bd9;p=crowdsupply.git diff --git a/updates/023_2020mar26_decoder_emulator_started.mdwn b/updates/023_2020mar26_decoder_emulator_started.mdwn index 502882c..83a282d 100644 --- a/updates/023_2020mar26_decoder_emulator_started.mdwn +++ b/updates/023_2020mar26_decoder_emulator_started.mdwn @@ -26,6 +26,8 @@ Here's the summary (if it can be called a summary): * Jacob's simple-soft-float library growing [Power FP compatibility](http://bugs.libre-riscv.org/show_bug.cgi?id=258) and python bindings. +* Kazan, the Vulkan driver Jacob is writing, is getting + a [new shader compiler IR](http://bugs.libre-riscv.org/show_bug.cgi?id=161). * A Conference call with OpenPOWER Foundation Director, Hugh, and Timothy Pearson from RaptorCS has been established every two weeks. * The OpenPOWER Foundation is also running some open @@ -149,16 +151,76 @@ level hierarchy. [Explaining this](http://bugs.libre-riscv.org/show_bug.cgi?id=178#c146) to Jean-Paul was amusing and challenging. Much bashing of heads against -walls and keyboards was involved. +walls and keyboards was involved. The basic plan: rather than have +coriolis2 perform an *entire* layout, in a flat and all-or-nothing fashion, +we need a much more subtle fine-grained approach, where *sub-blocks* are +laid-out, then *included* at a given level of hierarchy as "pre-done blocks". + +Save and repeat. + +This apparently had never been done before, and explaining it in words was +extremely challenging. Through a massive hack (actively editing the underlying +HDL files temporarily in between tasks) was the only way to illustrate it. +However once the lightbulb went on, Jean-Paul was able to get coriolis2's +c++ code into shape extremely rapidly, and this alone has opened up an +*entire new avenue* of potential for coriolis2 to be used in industry +for doing much larger ASICs. Which is precisely the kind of thing that +our NLNet sponsors (and the EU, from the Horizon 2020 Grant) love. hooray. +Now if only we could actually go to a conference and talk about it. # POWER ISA decoder and Simulator -TODO +*(kindly written by Michael)* + +The decoder we have is based on that of IBM's +[microwatt reference design](https://github.com/antonblanchard/microwatt). +As microwatt's decoder is quite regular, consisting of a bunch of large +switch statements returning fields of a struct, we elected not to +pursue a direct conversion of the VHDL to nmigen. Instead, we +extracted the information in the switch statements into several +[CSV tables](https://libre-riscv.org/openpower/isatables/), +and leveraged nmigen to construct the decoder from these +tables. We applied the same technique to extract the subfields +(register numbers, branch offset, immediates, etc.) from the +instruction, where Luke converted the information in the POWER ISA +specification to text, and wrote a module in python to extract those +fields from an instruction. + +To test the decoder, we initially verified it against the tables we +extracted, and manually against the [POWER ISA +specification](https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0). Later +however, we came up with the idea of [verifying the +decoder](https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/decoder/test/test_decoder_gas.py;h=9238d3878d964907c5569a3468d6895effb7dc02;hb=433ab59cf9b7ab1ae10754798fc1c110e705db76) +against the output of the GNU assembler. This is done by selecting an +instruction type (integer reg/reg, integer immediate, load store, +etc), and randomly selecting the opcode, registers, immediates, and +other operands. We then feed this instruction to GNU AS to assemble, +and then the assembled instruction is sent to our decoder. From this, +we can then verify that the output of the decoder matches what was +generated earlier. + +We also explored using a similar idea to test the functionality of the +entire SOC. By using the [QEMU](https://www.qemu.org/) PowerPC +emulator, we can compare the execution of our SOC against that of the +emulator to verify that our decoder and backend are working correctly. +We would write snippets of test code (or potentially randomly generate +instructions) and send the resulting binary to both the SOC and +QEMU. We would then simulate our SOC until it was finished executing +instructions, and use Qemu's gdb interface to do the same. We would +then use Qemu's gdb interface to compare the register file and memory +with that of our SOC to verify that it is working correctly. I did +some experimentation using this technique to verify a [rudimentary +simulator](https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/simulator/test_sim.py;h=aadaf667eff7317b1aa514993cd82b9abedf1047;hb=433ab59cf9b7ab1ae10754798fc1c110e705db76) +of the SOC backend, and it seemed to work quite well. + +*(Note from Luke: this automated approach, taking either other people's +regularly-written code or actual PDF specifications, not only saves us a +vast amount of time, it also ensures that our implementation is +correct and does not contain transcription errors).* # simple-soft-float Library and POWER FP emulation -The -[simple-soft-float](https://salsa.debian.org/Kazan-team/simple-soft-float) +The [simple-soft-float](https://salsa.debian.org/Kazan-team/simple-soft-float) library is a floating-point library Jacob wrote with the intention of being a reference implementation of IEEE 754 for hardware testing purposes. It's specifically designed to be written to be easier to @@ -198,7 +260,7 @@ rewriting a lot of the status-flag handling code since Power supports a much larger set of floating-point status flags and exceptions than most other ISAs. -Thanks to RaptorCS for giving us remote access to a Power9 system, +Thanks to Raptor CS for giving us remote access to a Power9 system, since that makes it much easier verifying that the test cases are correct (more on this below). @@ -206,17 +268,102 @@ API Docs for stable releases of both [simple-soft-float](https://docs.rs/simple-soft-float) and [algebraics](https://docs.rs/algebraics) are available on docs.rs. +The algebraics library was chosen as the +[Crate of the Week for October 8, 2019 on This Week in +Rust](https://this-week-in-rust.org/blog/2019/10/08/this-week-in-rust-307/#crate-of-the-week). + One of the really important things about these libraries: they're not -specifically coded exclusively for Libre-SOC: like softfloat-3 itself +specifically coded exclusively for Libre-SOC: like Berkeley softfloat itself (and also like the [IEEE754 FPU](https://git.libre-riscv.org/?p=ieee754fpu.git)) they're intended for *general-purpose* use by other projects. These are exactly the kinds of side-benefits for the wider Libre community that sponsorship, from individuals, Foundations (such as NLNet) and Companies (such as Purism and Raptor CS) brings. +# Kazan Getting a New Shader Compiler IR + +After spending several weeks only to discover that translating directly from +SPIR-V to LLVM IR, Vectorizing, and all the other front-end stuff all in a +single step is not really feasible, Jacob has switched to [creating a new +shader compiler IR](http://bugs.libre-riscv.org/show_bug.cgi?id=161) to allow +decomposing the translation process into several smaller steps. + +The IR and +SPIR-V to IR translator are being written simultaneously, since that allows +more easily finding the things that need to be represented in the shader +compiler IR. Because writing both of the IR and SPIR-V translator together is +such a big task, we decided to pick an arbitrary point ([translating a totally +trivial shader into the IR](http://bugs.libre-riscv.org/show_bug.cgi?id=177)) +and split it into tasks at that point so Jacob would be able to get paid +after several months of work. + +The IR uses structured control-flow inspired by WebAssembly's control-flow +constructs as well as +[SSA](https://en.wikipedia.org/wiki/Static_single_assignment_form) but, instead +of using traditional phi instructions, it uses block and loop parameters and +return values (inspired by [Cranelift's EBB +parameters](https://github.com/bytecodealliance/wasmtime/blob/master/cranelift/docs/ir.md#static-single-assignment-form) +as well as both of the [Rust](https://www.rust-lang.org/) and [Lua](https://www.lua.org/) programming languages). + +The IR has a single pointer type for all data pointers (`data_ptr`), unlike LLVM IR where pointer types have a type they point to (like `* i32`, where `i32` is the type the pointer points to). + +Because having a serialized form of the IR is important for any good IR, like +LLVM IR, it has a user-friendly textual form that can be both read and +written without losing any information (assuming the IR is valid, comments are +ignored). A binary form may be added later. + +Some example IR is [available in the Kazan repo](https://salsa.debian.org/Kazan-team/kazan/-/blob/master/docs/Shader%20Compiler%20IR%20Example.md). + # OpenPOWER Conference calls -TODO +We've now established a routine two-week conference call with Hugh Blemings, +OpenPOWER Foundation Director, and Timothy Pearson, CEO of Raptor CS. This +allows us to keep up-to-date (each way) on both our new venture and also +the newly-announced OpenPOWER Foundation effort as it progresses. + +One of the most important things that we, Libre-SOC, need, and are +discussing with Hugh and Tim is: a way to switch on/off functionality +in the (limited) 32-bit opcode space, so that we have one mode for +"POWER 3.0B compliance" and another for "things that are absolutely +essential to make a decent GPU". With these two being strongly +mutually exclusively incompatible, this is just absolutely critical. + +Khronos Vulkan Floating-point Compliance is, for example, critical not +just from a Khronos Trademark Compliance perspective, it's essential +from a power-saving and thus commercial success perspective. If we +have absolute strict compliance with IEEE754 for POWER 3.0B, this will +result in far more silicon than any commercially-competitive GPU on +the market, and we will not be able to sell product. Thus it is +*commercially* essential to be able to swap between POWER Compliance +and Khronos Compliance *at the silicon level*. + +POWER 3.0B does not have c++ style LR/SC atomic operations for example, +and if we have half a **million** 3D GPU data structures **per second** +that need SMP-level inter-core mutexes, and the current POWER 3.0B +multi-instruction atomic operations are used - conforming strictly to +the standard - we're highly likely to use 10 to 15 **percent** processing +power consumed on spin-locking. Finding out from Tim on one of these +calls that this is something that c++ atomics is something that end-users +have been asking about is therefore a good sign. + +Adding new and essential features that could well end up in a future version +of the POWER ISA *need* to be firewalled in a clean way, and we've been +asked to [draft a letter](https://libre-riscv.org/openpower/isans_letter/) +to some of the (very busy) engineers with a huge amount of knowledge +and experience inside IBM, for them to consider. Some help in reviewing +it would be greatly appreciated. + +These and many other things are why the calls with Tim and Hugh are a +good idea. The amazing thing is that they're taking us seriously, and +we can discuss things like those above with them. + +Other nice things we learned (more on this below) is that Epic Games +and RaptorCS are collaborating to get POWER9 supported in Unreal Engine. +And that the idea has been very tentatively considered to use our design +for the "boot management" processor, running +[OpenBMC](https://github.com/openbmc/openbmc). These are early days, +it's just ideas, ok! Aside from anything, we actually have to get a chip +done, first. # OpenPower Virtual Coffee Meetings @@ -245,7 +392,25 @@ we help each other. # Sponsorship by RaptorCS with a TALOS II Workstation -TODO http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005291.html +With many thanks to Timothy from +[RaptorCS](https://raptorcs.com), we've a new shiny +online server that needs +[setting up](http://bugs.libre-riscv.org/show_bug.cgi?id=265). +This machine is not just a "nice-to-have", it's actually essential for +us to be able to verify against. As you can see in the bugreport, the idea +is to bootstrap our way from running IEEE754 FP on a *POWER* system +(using typically gnu libm), verifying Jacob's algorithmic FP library +particularly and specifically for its rounding modes and exception modes. + +Once that is done, then apart from having a general-purpose library that +is compliant with POWER IEEE754 which *anyone else can use*, we can use +that to run unit tests against our[ +hardware IEEE754 FP library](https://git.libre-riscv.org/?p=ieee754fpu.git;a=summary) - +again, a resource that anyone may use in any arbitrary project - verifying +that it is also correct. This stepping-stone "bootstrap" method we are +deploying all over the place, however to do so we need access to resources +that have correctly-compliant implementations in the first place. Thus, +the critical importance of access to a TALOS II POWER9 workstation. # Epic Megagrants @@ -287,7 +452,7 @@ company to license a DDR3 PHY for a limited one-time use and only a 32-bit wide interface, we can contract SymbioticEDA to *design* a DDR3 PHY for us, which both we *and the rest of the worldwide Silicon Community can use without limitation* because we will ask SymbioticEDA to make the design -libre-licensed, for anyone to use. +(and layout) libre-licensed, for anyone to use. USD 250,000 pays for the mask charges that will allow us to do the 40nm quad-core ASIC that we have on the roadmap for the second chip. USD @@ -324,7 +489,46 @@ separate lists (below). # Georgia Tech CREATE-X -TODO +(*This section kindly written by Yehowshua*) + +Yehowshua is a student at Georgia Tech currently pursuing a Masters in +Computer Engineering - to graduate this summer. He had started working +on LibreSOC in December and wanted to to get LibreSOC more funding so +I could work on it full time. + +He originally asked if the ECE Chair at Georgia Tech would be willing +to fund an in-department effort to deliver an SOC in collaboration +with LibreSOC(an idea to which he was quite receptive). Through Luke, +Yehowshua got in contact with Chistopher Klaus who suggested Yehowshua +should look into Klaus's startup accelerator program Create-X and perhaps +consider taking LibreSOC down the startup route. Robert Rhinehart, who +had funded LibreSOC a little in the past (*note from Luke: he donated +the ZC706 and also funded modernisation of Richard Herveille's excellent +[vga_lcd](https://github.com/RoaLogic/vga_lcd) Library*) +also suggested that Yehowshua +incorporate LibreSOC with help from Create-X and said he would be willing +to be a seed investor. All this happened by February. + +As of March, Yehowshua has been talking with Robert about what type of +customers would be interested in LibreSOC. Robert is largely interested in +biological applications. Yehowshua also had a couple meetings with Rahul +from Create-X. Yehowshua has started the incorporation of LibreSOC. The +parent company will probably be called Systèmes-Libres with LibreSOC +simply being one of the products we will offer. Yehowshua also attended +HPCA in late February and had mentioned LIbreSOC during his talk. People +seemed to find the idea quite interesting + +He will later be speaking with some well know startup lawyers that have +an HQ in Atlanta to discuss business related things such as S Corps, +C corps, taxes, wages, equity etc… + +Yehowshua plans for Systèmes-Libres to hire full time employees. Part +time work on Libre-SOC will still be possible through donations and +support from NL Net and companies like purism. + +Currently, Yehowshua plans to take the Create-X summer launch program +and fund Systèmes-Libres by August. Full time wages would probably be +set around 100k USD. # LOAD/STORE Buffer and 6600 design documentation @@ -337,8 +541,10 @@ as well as the key differences is particularly important. At the very least, the extremely simple and highly effective hardware but timing-critical design aspects of the circular loops in the 6600 were recognised by James Thornton (the co-designer of the 6600) as being paradoxically challenging -to understand why so few gates could be so effective. Consequently, -documenting it just to be able to *develop* it is extremely important. +to understand why so few gates could be so effective (being as they were, +literally the world's first ever out-of-order superscalar architecture). +Consequently, documenting it just to be able to *develop* it is extremely +important. We're getting to the point where we need to connect the LOAD/STORE Computation Units up to an actual memory architecture. We've chosen @@ -398,9 +604,10 @@ from both Mitch Alsup and Staf Verhaegen. (Staf is also [sponsored by NLNet](https://nlnet.nl/project/Chips4Makers/) to create Libre-licensed Cell Libraries, busting through one of the - -many - layers of NDAs and reducing NREs for ASIC development: I helped him -put in the submission, and he was really happy to do the Cell Libraries -that we will be using for LibreSOC's 180nm test tape-out in October 2020.) +many - layers of NDAs and reducing NREs and unnecessary and artificial +barriers for ASIC development: I helped him put in the submission, and +he was really happy to do the Cell Libraries that we will be using for +LibreSOC's 180nm test tape-out in October 2020.) # Public-Inbox and Domain Migration @@ -414,8 +621,10 @@ Hence why we are going to the trouble of installing public-inbox, due to its ability to not only have a mailing list entirely stored in a git repository, the "web service" which provides access to that git-backed archive can be not only mirrored elsewhere, it can be *run locally on -your own offline machine*. This in combination with the right mailer -setup can store-and-forward any replies to the (offline-copied) messages, +your own local machine* even when offline. This in combination +with the right mailer setup can store-and-forward any replies to the +(offline-copied) messages, such that they can be sent when internet +connectivity is restored, yet remain a productive collaborative developer. Now you know why we absolutely do not accept "slack", or other proprietary "online oh-so-convenient" service. Not only is it highly inappropriate for