updates/023_2020mar26_decoder_emulator_started.mdwn

   1 So many things happened since the last update they actually need to go
   2 in the main update, even in summary form.  One big thing:
   3 [Raptor CS](https://www.raptorcs.com/)
   4 sponsored us with remote access to a Monster spec'd TALOS II Workstation!
   5
   6 # Introduction
   7
   8 Here's the summary (if it can be called a summary):
   9
  10 * [An announcement](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/004995.html)
  11   that we got the funding (which is open to anyone - hint, hint) resulted in
  12   at least three people reaching out to join the team.  "We don't need
  13   permission to own our own hardware" got a *really* positive reaction.
  14 * New team member, Jock (hello Jock!) starts on the coriolis2 layout,
  15   after Jean-Paul from LIP6.fr helped to dramatically improve how coriolis2
  16   can be used.  This resulted in a
  17   [tutorial](https://libre-riscv.org/HDL_workflow/coriolis2/) and a
  18   [huge bug report discussion](http://bugs.libre-riscv.org/show_bug.cgi?id=178)
  19 * Work has started on the
  20   [POWER ISA decoder](http://bugs.libre-riscv.org/show_bug.cgi?id=186),
  21   verified through
  22   [calling GNU AS](https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/decoder/test/test_decoder_gas.py;h=9238d3878d964907c5569a3468d6895effb7dc02;hb=56d145e42ac75626423915af22d1493f1e7bb143) (yes, really!)
  23   and on a mini-simulator
  24   [calling QEMU](https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/simulator/qemu.py;h=9eb103bae227e00a2a1d2ec4f43d7e39e4f44960;hb=56d145e42ac75626423915af22d1493f1e7bb143)
  25   for verification.
  26 * Jacob's simple-soft-float library growing
  27   [Power FP compatibility](http://bugs.libre-riscv.org/show_bug.cgi?id=258)
  28   and python bindings.
  29 * Kazan, the Vulkan driver Jacob is writing, is getting
  30   a [new shader compiler IR](http://bugs.libre-riscv.org/show_bug.cgi?id=161).
  31 * A Conference call with OpenPOWER Foundation Director, Hugh, and Timothy
  32   Pearson from RaptorCS has been established every two weeks.
  33 * The OpenPOWER Foundation is also running some open
  34   ["Virtual Coffee"](https://openpowerfoundation.org/openpower-virtual-coffee-calls/)
  35   weekly round-table calls for anyone interested, generally, in OpenPOWER
  36   development.
  37 * Tim sponsors our team with access to a Monster Talos II system with a
  38   whopping 128 GB RAM.  htop lists a staggering 72 cores (18 real
  39   with 4-way hyperthreading).
  40 * [Epic MegaGrants](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005262.html)
  41   reached out (hello!) to say they're still considering our
  42   request.
  43 * A marathon 3-hour session with [NLNet](http://nlnet.nl) resulted
  44   in the completion of the
  45   [Milestone tasks list(s)](http://bugs.libre-riscv.org/buglist.cgi?component=Milestones&list_id=567&resolution=---)
  46   and a
  47   [boat-load](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/thread.html)
  48   of bug reports to the list.
  49 * Immanuel Yehowshua is participating in the Georgia Tech
  50   [Create-X](https://create-x.gatech.edu/) Programme, and is establishing
  51   a Public Benefit Corporation in Atlanta, as an ethical vehicle for VC
  52   Funding.
  53 * A [Load/Store Buffer](http://bugs.libre-riscv.org/show_bug.cgi?id=216)
  54   design and
  55   [further discussion](http://bugs.libre-riscv.org/show_bug.cgi?id=257)
  56   including on
  57   [comp.arch](https://groups.google.com/forum/#!topic/comp.arch/cbGAlcCjiZE)
  58   inspired additional writeup
  59   on the
  60   [6600 scoreboard](https://libre-riscv.org/3d_gpu/architecture/6600scoreboard/)
  61   page.
  62 * [Public-Inbox](http://bugs.libre-riscv.org/show_bug.cgi?id=181) was
  63   installed successfully on the server, which is in the process of
  64   moving to a [new domain name](http://bugs.libre-riscv.org/show_bug.cgi?id=182)
  65   [Libre-SOC](http://libre-soc.org)
  66 * Build Servers have been set up with
  67   [automated testing](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005364.html)
  68   being established
  69
  70 Well dang, as you can see, suddenly it just went ballistic.  There's
  71 almost certainly things left off the list.  For such a small team there's
  72 a heck of a lot going on.  We have an awful lot to do, in a short amount
  73 of time: the 180nm tape-out is in October 2020 - only 7 months away.
  74
  75 With this update we're doing something slightly different: a request
  76 has gone out [to the other team members](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005428.html)
  77 to say a little bit about what each of them is doing.  This also helps me
  78 because these updates do take quite a bit of time to write.
  79
  80 # NLNet Funding announcement
  81
  82 An announcement went out
  83 [last year](https://lists.gnu.org/archive/html/libreplanet-discuss/2019-09/msg00170.html)
  84 that we'd applied for funding, and we got some great responses and
  85 feedback (such as "don't use patented AXI4").  The second time, we
  86 sent out a "we got it!" message and got some really nice private and
  87 public replies, as well as requests from people to join the team.
  88 More on that when it happens.
  89
  90 # Coriolis2 experimentation started
  91
  92 Jock, a really enthusiastic and clearly skilled and experienced python
  93 developer, has this to say about coriolis2:
  94
  95     As a humble Python developer, I understand the unique status and
  96     significance of the Coriolis project, nevertheless I cannot help
  97     but notice that it has a huge room for improvement. I genuinely hope
  98     that my participation in libre-riscv will also help improve Coriolis.
  99
 100 This was the short version, with a much more
 101 [detailed insight](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005478.html)
 102 listed here which would do well as a bugreport.  However the time it would
 103 take is quite significant.  We do have funding available from NLNet,
 104 so if there is anyone that would like to take this on, under the supervision
 105 of Jean-Paul at LIP6.fr, we can look at facilitating that.
 106
 107 One of the key insights that Jock came up with was that the coding style,
 108 whilst consistent, is something that specifically has to be learned, and,
 109 as such, being contrary to PEP8 in so many ways, creates an artificially
 110 high barrier and learning curve.
 111
 112 Even particularly experienced cross-language developers such as
 113 myself tend to be able to *read* such code, but editing it, when
 114 commas separating list items are on the beginning of lines, results in
 115 syntax errors automatically introduced *without thinking* because we
 116 automatically add them *at the end* because it looks like one is missing.
 117
 118 This is why we insisted on PEP8 in the
 119 [HDL workflow](http://libre-riscv.org/HDL_workflow) document.
 120
 121 Other than that: coriolis2 is actually extremely exciting to work with.
 122 Anyone who has done manual PCB layout will know quite how much of a relief
 123 it is to have auto-routing: this is what coriolis2 has by the bucket-load,
 124 *as well* as auto-placement.  We are looking at half a *million* objects
 125 (Cells) to place.  Without an auto-router / auto-placer this is just a
 126 flat-out impossible task.
 127
 128 The first step was to
 129 [learn and adapt coriolis2](http://bugs.libre-riscv.org/show_bug.cgi?id=178)
 130 which was needed to find out how much work would be involved, as much as
 131 anything else, in order to be able to accurately assign the fixed budgets
 132 to the NLNet milestones.  Following on from that, when Jock joined,
 133 we needed to work out a compact way to express the
 134 [layout of blocks](http://bugs.libre-riscv.org/show_bug.cgi?id=217#c44)
 135 and he's well on the way to achieving that.
 136
 137 Some of the pictures from coriolis2 are
 138 [stunning](bugs.libre-riscv.org/attachment.cgi?id=29).  This was an
 139 experimental routing of the IEEE754 FP 64-bit multiplier.  It took
 140 5 minutes to run, and is around 50,000 gates: as big as most silicon
 141 ASICs that have formerly been done with Coriolis2, and 50% of the
 142 practical size that can be handed in one go to the auto-place/auto-router.
 143
 144 Other designs using coriolis2 have been of the form where the major "blocks"
 145 (such as FPMUL, or Register File) are laid-out automatically in a single-level
 146 hierarchy, followed by full and total manual layout from that point onwawrds,
 147 in what is termed in the industry as a "Floorplan".
 148 With around 500,000 gates to do and many blocks being repeated, this approach
 149 is not viable for us.  We therefore need a *two* level or potentially three
 150 level hierarchy.
 151
 152 [Explaining this](http://bugs.libre-riscv.org/show_bug.cgi?id=178#c146)
 153 to Jean-Paul was amusing and challenging.  Much bashing of heads against
 154 walls and keyboards was involved.  The basic plan: rather than have
 155 coriolis2 perform an *entire* layout, in a flat and all-or-nothing fashion,
 156 we need a much more subtle fine-grained approach, where *sub-blocks* are
 157 laid-out, then *included* at a given level of hierarchy as "pre-done blocks".
 158
 159 Save and repeat.
 160
 161 This apparently had never been done before, and explaining it in words was
 162 extremely challenging.  Through a massive hack (actively editing the underlying
 163 HDL files temporarily in between tasks) was the only way to illustrate it.
 164 However once the lightbulb went on, Jean-Paul was able to get coriolis2's
 165 c++ code into shape extremely rapidly, and this alone has opened up an
 166 *entire new avenue* of potential for coriolis2 to be used in industry
 167 for doing much larger ASICs.  Which is precisely the kind of thing that
 168 our NLNet sponsors (and the EU, from the Horizon 2020 Grant) love.  hooray.
 169 Now if only we could actually go to a conference and talk about it.
 170
 171 # POWER ISA decoder and Simulator
 172
 173 *(kindly written by Michael)*
 174
 175 The decoder we have is based on that of IBM's
 176 [microwatt reference design](https://github.com/antonblanchard/microwatt).
 177 As microwatt's decoder is quite regular, consisting of a bunch of large
 178 switch statements returning fields of a struct, we elected not to
 179 pursue a direct conversion of the VHDL to nmigen. Instead, we
 180 extracted the information in the switch statements into several
 181 [CSV tables](https://libre-riscv.org/openpower/isatables/),
 182 and leveraged nmigen to construct the decoder from these
 183 tables. We applied the same technique to extract the subfields
 184 (register numbers, branch offset, immediates, etc.) from the
 185 instruction, where Luke converted the information in the POWER ISA
 186 specification to text, and wrote a module in python to extract those
 187 fields from an instruction.
 188
 189 To test the decoder, we initially verified it against the tables we
 190 extracted, and manually against the [POWER ISA
 191 specification](https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0). Later
 192 however, we came up with the idea of
 193 [verifying the decoder](https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/decoder/test/test_decoder_gas.py;h=9238d3878d964907c5569a3468d6895effb7dc02;hb=433ab59cf9b7ab1ae10754798fc1c110e705db76)
 194 against the output of the GNU assembler. This is done by selecting an
 195 instruction type (integer reg/reg, integer immediate, load store,
 196 etc), and randomly selecting the opcode, registers, immediates, and
 197 other operands. We then feed this instruction to GNU AS to assemble,
 198 and then the assembled instruction is sent to our decoder. From this,
 199 we can then verify that the output of the decoder matches what was
 200 generated earlier.
 201
 202 We also explored using a similar idea to test the functionality of the
 203 entire SOC. By using the [QEMU](https://www.qemu.org/) PowerPC
 204 emulator, we can compare the execution of our SOC against that of the
 205 emulator to verify that our decoder and backend are working correctly.
 206 We would write snippets of test code (or potentially randomly generate
 207 instructions) and send the resulting binary to both the SOC and
 208 QEMU. We would then simulate our SOC until it was finished executing
 209 instructions, and use Qemu's gdb interface to do the same. We would
 210 then use Qemu's gdb interface to compare the register file and memory
 211 with that of our SOC to verify that it is working correctly. I did
 212 some experimentation using this technique to verify a
 213 [rudimentary simulator](https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/simulator/test_sim.py;h=aadaf667eff7317b1aa514993cd82b9abedf1047;hb=433ab59cf9b7ab1ae10754798fc1c110e705db76)
 214 of the SOC backend, and it seemed to work quite well.
 215
 216 *(Note from Luke: this automated approach, taking either other people's
 217 regularly-written code or actual PDF specifications, not only saves us a
 218 vast amount of time, it also ensures that our implementation is
 219 correct and does not contain transcription errors).*
 220
 221 # simple-soft-float Library and POWER FP emulation
 222
 223 (*written kindly by Jacob*)
 224
 225 The [simple-soft-float](https://salsa.debian.org/Kazan-team/simple-soft-float)
 226 library is a floating-point library Jacob wrote with the intention
 227 of being a reference implementation of IEEE 754 for hardware testing
 228 purposes. It's specifically designed to be written to be easier to
 229 understand instead of having the code obscured in pursuit of speed:
 230
 231 * Being easier to understand helps prevent bugs where the code does not
 232   match the IEEE spec.
 233 * It uses the [algebraics](https://salsa.debian.org/Kazan-team/algebraics)
 234   library that Jacob wrote since that allows using numbers that behave
 235   like exact real numbers, making reasoning about the code simpler.
 236 * It is written in Rust rather than highly-macro-ified C, since that helps with
 237   readability since operations aren't obscured, as well as safety, since Rust
 238   proves at compile time that the code won't seg-fault unless you specifically
 239   opt-out of those guarantees by using `unsafe`.
 240
 241 It currently supports 16, 32, 64, 128-bit FP for RISC-V, along with
 242 having a `DynamicFloat` type which allows dynamically specifying all
 243 aspects of how a particular floating-point type behaves -- if one wanted,
 244 they could configure it as a 2048-bit floating-point type.
 245
 246 It also has Python bindings, thanks to the awesome
 247 [PyO3](https://pyo3.rs/) library for writing Python bindings in Rust.
 248
 249 We decided to write simple-soft-float instead
 250 of extending the industry-standard [Berkeley
 251 softfloat](http://www.jhauser.us/arithmetic/SoftFloat.html) library
 252 because of a range of issues, including not supporting Power FP, requiring
 253 recompilation to switch which ISA is being emulated, not supporting
 254 all the required operations, architectural issues such as depending on
 255 global variables, etc. We are still testing simple-soft-float against
 256 Berkeley softfloat where we can, however, since Berkeley softfloat is
 257 widely used and highly likely to be correct.
 258
 259 simple-soft-float is [gaining support for Power
 260 FP](http://bugs.libre-riscv.org/show_bug.cgi?id=258), which requires
 261 rewriting a lot of the status-flag handling code since Power supports a
 262 much larger set of floating-point status flags and exceptions than most
 263 other ISAs.
 264
 265 Thanks to Raptor CS for giving us remote access to a Power9 system,
 266 since that makes it much easier verifying that the test cases are correct
 267 (more on this below).
 268
 269 API Docs for stable releases of both
 270 [simple-soft-float](https://docs.rs/simple-soft-float) and
 271 [algebraics](https://docs.rs/algebraics) are available on docs.rs.
 272
 273 The algebraics library was chosen as the
 274 [Crate of the Week for October 8, 2019 on This Week in
 275 Rust](https://this-week-in-rust.org/blog/2019/10/08/this-week-in-rust-307/#crate-of-the-week).
 276
 277 One of the really important things about these libraries: they're not
 278 specifically coded exclusively for Libre-SOC: like Berkeley softfloat itself
 279 (and also like the [IEEE754 FPU](https://git.libre-riscv.org/?p=ieee754fpu.git))
 280 they're intended for *general-purpose* use by other projects.  These are
 281 exactly the kinds of side-benefits for the wider Libre community that
 282 sponsorship, from individuals, Foundations (such as NLNet) and Companies
 283 (such as Purism and Raptor CS) brings.
 284
 285 # Kazan Getting a New Shader Compiler IR
 286
 287 (*written kindly by Jacob, a dedicated update on Kazan will definitely
 288 feature in the future*)
 289
 290 After spending several weeks only to discover that translating directly from
 291 SPIR-V to LLVM IR, Vectorizing, and all the other front-end stuff all in a
 292 single step is not really feasible, Jacob has switched to [creating a new
 293 shader compiler IR](http://bugs.libre-riscv.org/show_bug.cgi?id=161) to allow
 294 decomposing the translation process into several smaller steps.
 295
 296 The IR and
 297 SPIR-V to IR translator are being written simultaneously, since that allows
 298 more easily finding the things that need to be represented in the shader
 299 compiler IR. Because writing both of the IR and SPIR-V translator together is
 300 such a big task, we decided to pick an arbitrary point ([translating a totally
 301 trivial shader into the IR](http://bugs.libre-riscv.org/show_bug.cgi?id=177))
 302 and split it into tasks at that point so Jacob would be able to get paid
 303 after several months of work.
 304
 305 The IR uses structured control-flow inspired by WebAssembly's control-flow
 306 constructs as well as
 307 [SSA](https://en.wikipedia.org/wiki/Static_single_assignment_form) but, instead
 308 of using traditional phi instructions, it uses block and loop parameters and
 309 return values (inspired by
 310 [Cranelift's EBB parameters](https://github.com/bytecodealliance/wasmtime/blob/master/cranelift/docs/ir.md#static-single-assignment-form)
 311 as well as both of the [Rust](https://www.rust-lang.org/) and
 312 [Lua](https://www.lua.org/) programming languages).
 313
 314 The IR has a single pointer type for all data pointers (`data_ptr`),
 315 unlike LLVM IR where pointer types have a type they point to (like `*
 316 i32`, where `i32` is the type the pointer points to).
 317
 318 Because having a serialized form of the IR is important for any good IR,
 319 like LLVM IR, it has a user-friendly textual form that can be both read
 320 and written without losing any information (assuming the IR is valid,
 321 comments are ignored). A binary form may be added later.
 322
 323 Some example IR is
 324 [available in the Kazan repo](https://salsa.debian.org/Kazan-team/kazan/-/blob/master/docs/Shader%20Compiler%20IR%20Example.md).
 325
 326 # OpenPOWER Conference calls
 327
 328 We've now established a routine two-week conference call with Hugh Blemings,
 329 OpenPOWER Foundation Director, and Timothy Pearson, CEO of Raptor CS.  This
 330 allows us to keep up-to-date (each way) on both our new venture and also
 331 the newly-announced OpenPOWER Foundation effort as it progresses.
 332
 333 One of the most important things that we, Libre-SOC, need, and are
 334 discussing with Hugh and Tim is: a way to switch on/off functionality
 335 in the (limited) 32-bit opcode space, so that we have one mode for
 336 "POWER 3.0B compliance" and another for "things that are absolutely
 337 essential to make a decent GPU".  With these two being strongly
 338 mutually exclusively incompatible, this is just absolutely critical.
 339
 340 Khronos Vulkan Floating-point Compliance is, for example, critical not
 341 just from a Khronos Trademark Compliance perspective, it's essential
 342 from a power-saving and thus commercial success perspective.  If we
 343 have absolute strict compliance with IEEE754 for POWER 3.0B, this will
 344 result in far more silicon than any commercially-competitive GPU on
 345 the market, and we will not be able to sell product.  Thus it is
 346 *commercially* essential to be able to swap between POWER Compliance
 347 and Khronos Compliance *at the silicon level*.
 348
 349 POWER 3.0B does not have c++ style LR/SC atomic operations for example,
 350 and if we have half a **million** 3D GPU data structures **per second**
 351 that need SMP-level inter-core mutexes, and the current POWER 3.0B
 352 multi-instruction atomic operations are used - conforming strictly to
 353 the standard - we're highly likely to use 10 to 15 **percent** processing
 354 power consumed on spin-locking.  Finding out from Tim on one of these
 355 calls that this is something that c++ atomics is something that end-users
 356 have been asking about is therefore a good sign.
 357
 358 Adding new and essential features that could well end up in a future version
 359 of the POWER ISA *need* to be firewalled in a clean way, and we've been
 360 asked to [draft a letter](https://libre-riscv.org/openpower/isans_letter/)
 361 to some of the (very busy) engineers with a huge amount of knowledge
 362 and experience inside IBM, for them to consider.  Some help in reviewing
 363 it would be greatly appreciated.
 364
 365 These and many other things are why the calls with Tim and Hugh are a
 366 good idea.  The amazing thing is that they're taking us seriously, and
 367 we can discuss things like those above with them.
 368
 369 Other nice things we learned (more on this below) is that Epic Games
 370 and RaptorCS are collaborating to get POWER9 supported in Unreal Engine.
 371 And that the idea has been very tentatively considered to use our design
 372 for the "boot management" processor, running
 373 [OpenBMC](https://github.com/openbmc/openbmc).  These are early days,
 374 it's just ideas, ok!  Aside from anything, we actually have to get a chip
 375 done, first.
 376
 377 # OpenPower Virtual Coffee Meetings
 378
 379 The "Virtual Coffee Meetings", announced
 380 [here](https://openpowerfoundation.org/openpower-virtual-coffee-calls/)
 381 are literally open to anyone interested in OpenPOWER (if you're strictly
 382 Libre there's a dial-in method).  These calls are not recorded, it's
 383 just an informal conversation.
 384
 385 What's a really nice surprise is finding
 386 out that Paul Mackerras, whom I used to work with 20 years ago, is *also*
 387 working on OpenPOWER, specifically
 388 [microwatt](https://github.com/antonblanchard/microwatt), being managed
 389 by Anton Blanchard.
 390
 391 A brief discussion led to learning that Paul is looking at adding TLB
 392 (Virtual Memory) support to microwatt, specifically the RADIX TLB.
 393 I therefore pointed him at the same resource
 394 [(power-gem5)](https://github.com/power-gem5/gem5/tree/gem5-experimental)
 395 that Hugh had kindly pointed me at, the week before, and did a
 396 [late night write-up](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005445.html)
 397
 398 My feeling is that these weekly round-table meetings are going to be
 399 really important for everyone involved in OpenPOWER.  It's a community:
 400 we help each other.
 401
 402 # Sponsorship by RaptorCS with a TALOS II Workstation
 403
 404 With many thanks to Timothy from
 405 [RaptorCS](https://raptorcs.com), we've a new shiny
 406 online server that needs
 407 [setting up](http://bugs.libre-riscv.org/show_bug.cgi?id=265).
 408 This machine is not just a "nice-to-have", it's actually essential for
 409 us to be able to verify against.  As you can see in the bugreport, the idea
 410 is to bootstrap our way from running IEEE754 FP on a *POWER* system
 411 (using typically gnu libm), verifying Jacob's algorithmic FP library
 412 particularly and specifically for its rounding modes and exception modes.
 413
 414 Once that is done, then apart from having a general-purpose library that
 415 is compliant with POWER IEEE754 which *anyone else can use*, we can use
 416 that to run unit tests against our[
 417 hardware IEEE754 FP library](https://git.libre-riscv.org/?p=ieee754fpu.git;a=summary) -
 418 again, a resource that anyone may use in any arbitrary project - verifying
 419 that it is also correct.  This stepping-stone "bootstrap" method we are
 420 deploying all over the place, however to do so we need access to resources
 421 that have correctly-compliant implementations in the first place.  Thus,
 422 the critical importance of access to a TALOS II POWER9 workstation.
 423
 424 # Epic Megagrants
 425
 426 Several months back I got word of the existence of Epic Games' "Megagrants".
 427 In December 2019 they announced that so far they've given
 428 [USD $13 million](https://www.unrealengine.com/en-US/blog/epic-megagrants-reaches-13-million-milestone-in-2019)
 429 to 200 recipients, so far: one of them, the Blender Foundation, was
 430 [USD $1.2 million](https://www.blender.org/press/epic-games-supports-blender-foundation-with-1-2-million-epic-megagrant/)!
 431 This is an amazing and humbling show of support for the 3D Community,
 432 world-wide.
 433
 434 It's not just "games", or products specifically using the Unreal Engine:
 435 they're happy to look at anything that "enhances Libre / Open source"
 436 capabilities for the 3D Graphics Community.
 437
 438 A full hybrid 3D-capable CPU-GPU-VPU which is fully-documented not just in
 439 its capabilities, that [documentation](http://libre-riscv.org) and
 440 [full source code](http://git.libre-riscv.org) kinda extends
 441 right the way through the *entire development process* down to the bedrock
 442 of the actual silicon - not just the firmware, bootloader and BIOS,
 443 *everything* - in my mind it kinda qualifies in way that can, in some
 444 delightful way, be characterised delicately as "complete overkill".
 445
 446 Interestingly, guys, if you're reading this: Tim, the CEO of RaptorCS
 447 informs us that you're working closely with his team to get the Unreal
 448 Engine up and running on the POWER architecture?  Wouldn't that be highly
 449 amusing, for us to be able to run the Unreal Engine on the Libre-SOC,
 450 given that it's going to be POWER compatible hardware, as a test,
 451 first initially in FPGA and then in 18-24 months, on actual silicon, eh?
 452
 453 So, as I mentioned
 454 [on the list](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005262.html)
 455 (reiterating what I put in the original application), we're happy with
 456 USD $25,000, we're happy with USD $10 million.  It's really up to you guys,
 457 at Epic Games, as to what level you'd like to see us get to, and how fast.
 458
 459 USD $600,000 for example we can instead of paying USD $1million to a
 460 proprietary company to license a DDR3 PHY for a limited one-time use and
 461 only a 32-bit wide interface, we can contract SymbioticEDA to *design*
 462 a DDR3 PHY for us, which both we *and the rest of the worldwide Silicon
 463 Community can use without limitation* because we will ask SymbioticEDA
 464 to make the design (and layout) libre-licensed, for anyone to use.
 465
 466 USD 250,000 pays for the mask charges that will allow us to do the 40nm
 467 quad-core ASIC that we have on the roadmap for the second chip. USD
 468 $1m pays for 28nm masks (and so on, in an exponential ramp-up).  No, we
 469 don't want to do that straight away: yes we do want to go through a first
 470 proving test ASIC in 180nm, which, thanks to NLNet, is already funded.
 471 This is just good sane sensible use of funds.
 472
 473 Even USD $25,000 helps us to cover things such as administration of the
 474 website (which is taking up a *lot* of time) and little things that we
 475 didn't quite foresee when putting in the NLNet Grant Applications.
 476
 477 Lastly, one of the conditions as I understood it from the Megagrants
 478 process is that the funds are paid in "stages".  This is exactly
 479 what NLNet does for (and with) us, right now.  If you wanted to save
 480 administrative costs, there may be some benefit to having a conversation
 481 with the [30-year-old](https://nlnet.nl/foundation/history/)
 482 NLNet Charitable Foundation.  Something to think about?
 483
 484 # NLNet Milestone tasks
 485
 486 Part of applying for NLNet's Grants is a requirement to create a list
 487 of tasks, each of which is assigned a budget.  On 100% completion of the task,
 488 donations can be sent out.  With *six* new proposals accepted, each of which
 489 required between five (minimum) and *ninteen* separate and distinct tasks,
 490 a call with Michiel and Joost turned into an unexpected three hour online
 491 marathon, scrambling to write almost fifty bugreports as part of the Schedule
 492 to be attached to each Memorandum of Understanding.  The mailing list
 493 got a
 494 [leeetle bit busy](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005003.html)
 495 right around here.
 496
 497 Which emphasised for us the important need to subdivide the mailing list into
 498 separate lists (below).
 499
 500 # Georgia Tech CREATE-X
 501
 502 (*This section kindly written by Yehowshua*)
 503
 504 Yehowshua is a student at Georgia Tech currently pursuing a Masters in
 505 Computer Engineering - to graduate this summer. He had started working
 506 on Libre-SOC in December and wanted to to get Libre-SOC more funding so
 507 I could work on it full time.
 508
 509 He originally asked if the ECE Chair at Georgia Tech would be willing
 510 to fund an in-department effort to deliver an SOC in collaboration
 511 with Libre-SOC(an idea to which he was quite receptive). Through Luke,
 512 Yehowshua got in contact with Chistopher Klaus who suggested Yehowshua
 513 should look into Klaus's startup accelerator program Create-X and perhaps
 514 consider taking Libre-SOC down the startup route.  Robert Rhinehart, who
 515 had funded Libre-SOC a little in the past (*note from Luke: he donated
 516 the ZC706 and also funded modernisation of Richard Herveille's excellent
 517 [vga_lcd](https://github.com/RoaLogic/vga_lcd) Library*)
 518 also suggested that Yehowshua
 519 incorporate Libre-SOC with help from Create-X and said he would be willing
 520 to be a seed investor. All this happened by February.
 521
 522 As of March, Yehowshua has been talking with Robert about what type of
 523 customers would be interested in Libre-SOC. Robert is largely interested in
 524 biological applications. Yehowshua also had a couple meetings with Rahul
 525 from Create-X. Yehowshua has started the incorporation of Libre-SOC. The
 526 parent company will probably be called Systèmes-Libres with Libre-SOC
 527 simply being one of the products we will offer. Yehowshua also attended
 528 HPCA in late February and had mentioned Libre-SOC during his talk. People
 529 seemed to find the idea quite interesting
 530
 531 He will later be speaking with some well know startup lawyers that have
 532 an HQ in Atlanta to discuss business related things such as S Corps,
 533 C corps, taxes, wages, equity etc…
 534
 535 Yehowshua plans for Systèmes-Libres to hire full time employees. Part
 536 time work on Libre-SOC will still be possible through donations and
 537 support from NL Net and companies like purism.
 538
 539 Currently, Yehowshua plans to take the Create-X summer launch program
 540 and fund Systèmes-Libres by August. Full time wages would probably be
 541 set around 100k USD.
 542
 543 # LOAD/STORE Buffer and 6600 design documentation
 544
 545 A critical part of this project is not just to create a chip, it's to
 546 *document* the chip design, the decisions along the way, for both
 547 educational, research, and ongoing maintenance purposes.  With an
 548 augmented CDC 6600 design being chosen as the fundamental basis,
 549 [documenting that](https://libre-riscv.org/3d_gpu/architecture/6600scoreboard/)
 550 as well as the key differences is particularly important.  At the very least,
 551 the extremely simple and highly effective hardware but timing-critical
 552 design aspects of the circular loops in the 6600 were recognised by James
 553 Thornton (the co-designer of the 6600) as being paradoxically challenging
 554 to understand why so few gates could be so effective (being as they were,
 555 literally the world's first ever out-of-order superscalar architecture).
 556 Consequently, documenting it just to be able to *develop* it is extremely
 557 important.
 558
 559 We're getting to the point where we need to connect the LOAD/STORE Computation
 560 Units up to an actual memory architecture.  We've chosen
 561 [minerva](https://github.com/lambdaconcept/minerva/blob/master/minerva/units/loadstore.py)
 562 as the basis because it is written in nmigen, works, and, crucially, uses
 563 wishbone (which we decided to use as the main Bus Backbone a few months ago).
 564
 565 However, unlike minerva, which is a single-issue 32-bit embedded chip,
 566 where it's perfectly ok to have one single LD/ST operation per clock,
 567 and not only that but to have that operation take a few clock cycles,
 568 to get anything like the level of performance needed of a GPU, we need
 569 at least four 64-bit LOADs or STOREs *every clock cycle*.
 570
 571 For a first ASIC from a team that's never done a chip before, this is,
 572 officially, "Bonkers Territory".  Where minerva is doing 32-bit-wide
 573 Buses (and does not support 64-bit LD/ST at all), we need internal
 574 data buses of a minimum whopping **2000** wires wide.
 575
 576 Let that sink in for a moment.
 577
 578 The reason why the internal buses need to be 2000 wires wide comes down
 579 to the fact that we need, realistically, 6 to eight LOAD/STORE Computation
 580 Units.  4 of them will be operational, 2 to 4 of them will be waiting
 581 with pending instructions from the multi-issue Vectorisation Engine.
 582
 583 We chose to use a system which expands the first 4 bits of the address,
 584 plus the operation width (1,2,4,8 bytes) into a "bitmap" - a byte-mask -
 585 that corresponds directly with the 16 byte "cache line" byte enable
 586 columns, in the L1 Cache.  These bitmaps can then be "merged" such
 587 that requests that go to the same cache line can be served *in the
 588 same clock cycle* to multiple LOAD/STORE Computation Units.  This
 589 being absolutely critical for effective Vector Processing.
 590
 591 Additionally, in order to deal with misaligned memory requests, each of those
 592 needs to put out *two* such 16-byte-wide requests (see where this is going?)
 593 out to the L1 Cache.
 594 So, we now have eight times two times 128 bits which is a staggering
 595 2048 wires *just for the data*.  There do exist ways to get that down
 596 (potentially to half), and there do exist ways to get that cut in half
 597 again, however doing so would miss opportunities for merging of requests
 598 into cache lines.
 599
 600 At that point, thanks to Mitch Alsup's input (Mitch is the designer of
 601 the Motorola 68000, Motorola 88120, key architecture on AMD's Opteron
 602 Series, the AMD K9, AMDGPU and Samsung's latest GPU), we learned that
 603 L1 cache design critically depends on what type of SRAM you have.  We
 604 initially, naively, wanted dual-ported L1 SRAM and that's when Staf
 605 and Mitch taught us that this results in half-duty rate.  Only
 606 1-Read **or** 1-Write SRAM Cells give you fast enough (single-cycle)
 607 data rates to be useable for L1 Caches.
 608
 609 Part of the conversation has wandered into
 610 [why we chose dynamic pipelines](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005459.html)
 611 as well as receiving that
 612 [important advice](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005354.html)
 613 from both Mitch Alsup and Staf Verhaegen.
 614
 615 (Staf is also [sponsored by NLNet](https://nlnet.nl/project/Chips4Makers/)
 616 to create Libre-licensed Cell Libraries, busting through one of the -
 617 many - layers of NDAs and reducing NREs and unnecessary and artificial
 618 barriers for ASIC development: I helped him put in the submission, and
 619 he was really happy to do the Cell Libraries that we will be using for
 620 Libre-SOC's 180nm test tape-out in October 2020.)
 621
 622 # Public-Inbox and Domain Migration
 623
 624 As mentioned before, one of the important aspects of this project is
 625 the documentation and archiving.  It also turns out that when working
 626 over an extremely unreliable or ultra-expensive mobile broadband link,
 627 having *local* (offline) access to every available development resource
 628 is critically important.
 629
 630 Hence why we are going to the trouble of installing public-inbox, due
 631 to its ability to not only have a mailing list entirely stored in a
 632 git repository, the "web service" which provides access to that git-backed
 633 archive can be not only mirrored elsewhere, it can be *run locally on
 634 your own local machine* even when offline.  This in combination
 635 with the right mailer setup can store-and-forward any replies to the
 636 (offline-copied) messages, such that they can be sent when internet
 637 connectivity is restored, yet remain a productive collaborative developer.
 638
 639 Now you know why we absolutely do not accept "slack", or other proprietary
 640 "online oh-so-convenient" service.  Not only is it highly inappropriate for
 641 Libre Projects, not only do we become critically dependent on the Corporation
 642 running the service (yes, github has been entirely offline, several times),
 643 if we have remote developers (such as myself, working from Scotland last
 644 month with sporadic access to a single Cell Tower) or developers in emerging
 645 markets where their only internet access is via a Library or Internet Cafe,
 646 we absolutely do not want to exclude or penalise such people, just because
 647 they have less resources.
 648
 649 Fascinatingly, Linus Torvals is *specifically*
 650 [on record](https://www.linuxjournal.com/content/line-length-limits)
 651 about making sure that "Linux development does not favour wealthy people".
 652
 653 We are also, as mentioned before, moving to a new domain name.  We'll take
 654 the opportunity to fix some of the issues with HTTPS (wrong certificate),
 655 and also do some
 656 [better mailing list names](http://bugs.libre-riscv.org/show_bug.cgi?id=184)
 657 at the same time.
 658
 659 TODO (Veera?) bit about what was actually done, how it links into mailman2.
 660
 661 # OpenPOWER HDL Mailing List opens up
 662
 663 It is early days, however it is fantastic to see responses from IBM with
 664 regards to requests for access to the POWER ISA Specification
 665 documents in
 666 [machine-readable form](http://lists.mailinglist.openpowerfoundation.org/pipermail/openpower-hdl-cores/2020-March/000007.html)
 667 I took Jeff at his word and explained, in some detail,
 668 [exactly why](http://lists.mailinglist.openpowerfoundation.org/pipermail/openpower-hdl-cores/2020-March/000008.html)
 669 machine readable versions of specifications are critically important.
 670
 671 The takeaway is: *we haven't got time to do manual transliteration of the spec*
 672 into "code".  We're expending considerable effort making sure that we
 673 "bounce" or "bootstrap" off of pre-existing resources, using computer
 674 programs to do so.
 675
 676 This "trick" is something that I learned over 20 years ago, when developing
 677 an SMB Client and Server in something like two weeks flat.  I wrote a
 678 parser which read the packet formats *from the IETF Draft Specification*,
 679 and outputted c-code.
 680
 681 This leaves me wondering, as I mention on the HDL list, if we can do the same
 682 thing with large sections of the POWER Spec (*answer as of 3rd April 2020:
 683 [yes](https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/decoder/power_pseudo.py;h=f2e575e8c5b707e7ec2f8d2ea6ca6d36060e08ad;hb=af3c6727c8bb59623bf5672b867407b5516e8338)*)
 684
 685 # Build Servers, Process Automation, and Reducing Cognitive Load
 686
 687 (*written kindly by Cole*)
 688
 689 Over the past month, Jacob, and a new project member Cole, set up a new
 690 build server for the project. The build server is an old computer that
 691 Jacob wasn't using anymore, that he decided to make available to the
 692 project for running continuous integration (CI) testing for the many
 693 modules and submodules of the project. The build server is a gitlab test
 694 runner instance using a Docker backend. As Luke has taken pains to make
 695 clear
 696 [many times](https://libre-riscv.org/HDL_workflow/),
 697 very large and complex python projects are guaranteed
 698 to fail without proper, extensive test coverage. This new build server
 699 will allow us to
 700 [automate](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-April/005687.html)
 701 the running, monitoring, and reporting of these
 702 tests, giving us the ability to push a commit and have it automatically
 703 "verified" as cohesive with the existing codebase. Automating feedback,
 704 will help provide more confidence to the engineers that their code isn't
 705 breaking some other functionality in a codebase they are working on,
 706 and should also help improve the ease of long-term maintainability
 707 of the code. The more we can automate the menial tasks that have to
 708 be repeated frequently, and are important for success of the project
 709 but are not related to progressing the engineering of the Libre-SOC,
 710 the more productive project members can be.
 711
 712 To help continue to ease such administrative burdens on the engineers,
 713 Cole is also working on a repository of setup automation scripts. The first
 714 script is one that will replicate the setup of Jacob's build server,
 715 so that others who want to contribute computational resources to the
 716 project may do so easily. Cole is also working on a collection of modular
 717 scripts to automate the setup of the development environment for the
 718 HDL workflow and the layout of the SOC, including the installation of
 719 development branches of a substainal number of very complex pieces of
 720 software. This should help ease the process of onboarding new members
 721 to the project, especially some interns that we have coming onboard in
 722 the next few months to do the layout of the chip. These scripts will be
 723 available via the git.libre-riscv.org repository dev-env-setup, at the
 724 [following link](http://git.libre-riscv.org/?p=dev-env-setup.git)
 725
 726 # Conclusion
 727
 728 I'm not going to mention anything about the current world climate: you've
 729 seen enough news reports.  I will say (more about this through the
 730 [EOMA68](https://www.crowdsupply.com/eoma68/micro-desktop) updates) that
 731 I anticipated something like what is happening right now, over ten years
 732 ago.  I wasn't precisely expecting what *has* happened, just the consequences:
 733 world-wide travel shut-down, and for people - the world over - to return to
 734 local community roots.
 735
 736 However what I definitely wasn't expecting was a United States President
 737 to be voted in who was eager and willing to start *and escalate* a Trade
 738 war with China, even during the current world climate where both local
 739 and global collaboration, **not** competition, is more important than
 740 ever before.  The impact of his decisions on the U.S economy alone, and
 741 the reputation of the whole country, has been detrimental in the extreme.
 742
 743 This combination leaves us - world-wide - with the strong possibility that
 744 seemed so "preposterous" that I could in no way discuss it widely, let alone
 745 mention it on something like a Crowdsupply update, that thanks to the
 746 business model on which their entire product lifecycle is predicated,
 747 in combination with the extremely high NREs and development costs for
 748 ASICs (custom silicon costs USD $100 million, these days), several
 749 large Corporations producing proprietary binary-only drivers for
 750 hardware on which we critically rely for our internet-connected way
 751 of life **may soon go out of business**.
 752
 753 Right at a critical time where video conferencing is taking off massively,
 754 your proprietary hardware - your smartphone, your tablet, your laptop,
 755 everything you rely on for connectivity to the rest of the world, all of
 756 a sudden **you may not be able to get software updates** or, worse,
 757 your products could even be
 758 [remotely shut down](https://www.theguardian.com/technology/2016/apr/05/revolv-devices-bricked-google-nest-smart-home)
 759 **without warning**.
 760
 761 I do not want to hammer the point home too strongly but you should be
 762 getting, in no uncertain terms, exactly how strategically critical, in
 763 the current world climate, this project just became.  We need to get it
 764 accelerated, completed, and into production, in an expedited and responsible
 765 fashion.
 766