So many things happened since the last update they actually need to go
in the main update, even in summary form.  One big thing:
[Raptor CS](https://www.raptorcs.com/)
sponsored us with remote access to a Monster spec'd TALOS II Workstation!

# Introduction

Here's the summary (if it can be called a summary):

* [An announcement](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/004995.html)
  that we got the funding (which is open to anyone - hint, hint) resulted in
  at least three people reaching out to join the team.  "We don't need
  permission to own our own hardware" got a *really* positive reaction.
* New team member, Jock (hello Jock!) starts on the coriolis2 layout,
  after Jean-Paul from LIP6.fr helped to dramatically improve how coriolis2
  can be used.  This resulted in a
  [tutorial](https://libre-riscv.org/HDL_workflow/coriolis2/) and a
  [huge bug report discussion](http://bugs.libre-riscv.org/show_bug.cgi?id=178)
* Work has started on the
  [POWER ISA decoder](http://bugs.libre-riscv.org/show_bug.cgi?id=186),
  verified through
  [calling GNU AS](https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/decoder/test/test_decoder_gas.py;h=9238d3878d964907c5569a3468d6895effb7dc02;hb=56d145e42ac75626423915af22d1493f1e7bb143) (yes, really!)
  and on a mini-simulator
  [calling QEMU](https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/simulator/qemu.py;h=9eb103bae227e00a2a1d2ec4f43d7e39e4f44960;hb=56d145e42ac75626423915af22d1493f1e7bb143)
  for verification.
* Jacob's simple-soft-float library growing
  [Power FP compatibility](http://bugs.libre-riscv.org/show_bug.cgi?id=258)
  and python bindings.
* Kazan, the Vulkan driver Jacob is writing, is getting
  a [new shader compiler IR](http://bugs.libre-riscv.org/show_bug.cgi?id=161).
* A Conference call with OpenPOWER Foundation Director, Hugh, and Timothy
  Pearson from RaptorCS has been established every two weeks.
* The OpenPOWER Foundation is also running some open
  ["Virtual Coffee"](https://openpowerfoundation.org/openpower-virtual-coffee-calls/)
  weekly round-table calls for anyone interested, generally, in OpenPOWER
  development.
* Tim sponsors our team with access to a Monster Talos II system with a
  whopping 128 GB RAM.  htop lists a staggering 72 cores (18 real
  with 4-way hyperthreading).
* [Epic MegaGrants](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005262.html)
  reached out (hello!) to say they're still considering our
  request.
* A marathon 3-hour session with [NLNet](http://nlnet.nl) resulted
  in the completion of the
  [Milestone tasks list(s)](http://bugs.libre-riscv.org/buglist.cgi?component=Milestones&list_id=567&resolution=---)
  and a
  [boat-load](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/thread.html)
  of bug reports to the list.
* Immanuel Yehowshua is participating in the Georgia Tech
  [Create-X](https://create-x.gatech.edu/) Programme, and is establishing
  a Public Benefit Corporation in Atlanta, as an ethical vehicle for VC
  Funding.
* A [Load/Store Buffer](http://bugs.libre-riscv.org/show_bug.cgi?id=216)
  design and
  [further discussion](http://bugs.libre-riscv.org/show_bug.cgi?id=257)
  including on
  [comp.arch](https://groups.google.com/forum/#!topic/comp.arch/cbGAlcCjiZE)
  inspired additional writeup
  on the
  [6600 scoreboard](https://libre-riscv.org/3d_gpu/architecture/6600scoreboard/)
  page.
* [Public-Inbox](http://bugs.libre-riscv.org/show_bug.cgi?id=181) was
  installed successfully on the server, which is in the process of
  moving to a [new domain name](http://bugs.libre-riscv.org/show_bug.cgi?id=182)
  [Libre-SOC](http://libre-soc.org)
* Build Servers have been set up with
  [automated testing](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005364.html)
  being established

Well dang, as you can see, suddenly it just went ballistic.  There's
almost certainly things left off the list.  For such a small team there's
a heck of a lot going on.  We have an awful lot to do, in a short amount
of time: the 180nm tape-out is in October 2020 - only 7 months away.

With this update we're doing something slightly different: a request
has gone out [to the other team members](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005428.html)
to say a little bit about what each of them is doing.  This also helps me
because these updates do take quite a bit of time to write.

# NLNet Funding announcement

An announcement went out
[last year](https://lists.gnu.org/archive/html/libreplanet-discuss/2019-09/msg00170.html)
that we'd applied for funding, and we got some great responses and
feedback (such as "don't use patented AXI4").  The second time, we
sent out a "we got it!" message and got some really nice private and
public replies, as well as requests from people to join the team.
More on that when it happens.

# Coriolis2 experimentation started

Jock, a really enthusiastic and clearly skilled and experienced python
developer, has this to say about coriolis2:

    As a humble Python developer, I understand the unique status and
    significance of the Coriolis project, nevertheless I cannot help
    but notice that it has a huge room for improvement. I genuinely hope
    that my participation in libre-riscv will also help improve Coriolis.

This was the short version, with a much more
[detailed insight](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005478.html)
listed here which would do well as a bugreport.  However the time it would
take is quite significant.  We do have funding available from NLNet,
so if there is anyone that would like to take this on, under the supervision
of Jean-Paul at LIP6.fr, we can look at facilitating that.

One of the key insights that Jock came up with was that the coding style,
whilst consistent, is something that specifically has to be learned, and,
as such, being contrary to PEP8 in so many ways, creates an artificially
high barrier and learning curve.

Even particularly experienced cross-language developers such as
myself tend to be able to *read* such code, but editing it, when
commas separating list items are on the beginning of lines, results in
syntax errors automatically introduced *without thinking* because we
automatically add them *at the end* because it looks like one is missing.

This is why we insisted on PEP8 in the
[HDL workflow](http://libre-riscv.org/HDL_workflow) document.

Other than that: coriolis2 is actually extremely exciting to work with.
Anyone who has done manual PCB layout will know quite how much of a relief
it is to have auto-routing: this is what coriolis2 has by the bucket-load,
*as well* as auto-placement.  We are looking at half a *million* objects
(Cells) to place.  Without an auto-router / auto-placer this is just a
flat-out impossible task.

The first step was to
[learn and adapt coriolis2](http://bugs.libre-riscv.org/show_bug.cgi?id=178)
which was needed to find out how much work would be involved, as much as
anything else, in order to be able to accurately assign the fixed budgets
to the NLNet milestones.  Following on from that, when Jock joined,
we needed to work out a compact way to express the
[layout of blocks](http://bugs.libre-riscv.org/show_bug.cgi?id=217#c44)
and he's well on the way to achieving that.

Some of the pictures from coriolis2 are
[stunning](bugs.libre-riscv.org/attachment.cgi?id=29).  This was an
experimental routing of the IEEE754 FP 64-bit multiplier.  It took
5 minutes to run, and is around 50,000 gates: as big as most silicon
ASICs that have formerly been done with Coriolis2, and 50% of the
practical size that can be handed in one go to the auto-place/auto-router.

Other designs using coriolis2 have been of the form where the major "blocks"
(such as FPMUL, or Register File) are laid-out automatically in a single-level
hierarchy, followed by full and total manual layout from that point onwawrds,
in what is termed in the industry as a "Floorplan".
With around 500,000 gates to do and many blocks being repeated, this approach
is not viable for us.  We therefore need a *two* level or potentially three
level hierarchy.

[Explaining this](http://bugs.libre-riscv.org/show_bug.cgi?id=178#c146)
to Jean-Paul was amusing and challenging.  Much bashing of heads against
walls and keyboards was involved.  The basic plan: rather than have
coriolis2 perform an *entire* layout, in a flat and all-or-nothing fashion,
we need a much more subtle fine-grained approach, where *sub-blocks* are
laid-out, then *included* at a given level of hierarchy as "pre-done blocks".

Save and repeat.

This apparently had never been done before, and explaining it in words was
extremely challenging.  Through a massive hack (actively editing the underlying
HDL files temporarily in between tasks) was the only way to illustrate it.
However once the lightbulb went on, Jean-Paul was able to get coriolis2's
c++ code into shape extremely rapidly, and this alone has opened up an
*entire new avenue* of potential for coriolis2 to be used in industry
for doing much larger ASICs.  Which is precisely the kind of thing that
our NLNet sponsors (and the EU, from the Horizon 2020 Grant) love.  hooray.
Now if only we could actually go to a conference and talk about it.

# POWER ISA decoder and Simulator

*(kindly written by Michael)*

The decoder we have is based on that of IBM's
[microwatt reference design](https://github.com/antonblanchard/microwatt).
As microwatt's decoder is quite regular, consisting of a bunch of large
switch statements returning fields of a struct, we elected not to
pursue a direct conversion of the VHDL to nmigen. Instead, we
extracted the information in the switch statements into several
[CSV tables](https://libre-riscv.org/openpower/isatables/),
and leveraged nmigen to construct the decoder from these
tables. We applied the same technique to extract the subfields
(register numbers, branch offset, immediates, etc.) from the
instruction, where Luke converted the information in the POWER ISA
specification to text, and wrote a module in python to extract those
fields from an instruction.

To test the decoder, we initially verified it against the tables we
extracted, and manually against the [POWER ISA
specification](https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0). Later
however, we came up with the idea of [verifying the
decoder](https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/decoder/test/test_decoder_gas.py;h=9238d3878d964907c5569a3468d6895effb7dc02;hb=433ab59cf9b7ab1ae10754798fc1c110e705db76)
against the output of the GNU assembler. This is done by selecting an
instruction type (integer reg/reg, integer immediate, load store,
etc), and randomly selecting the opcode, registers, immediates, and
other operands. We then feed this instruction to GNU AS to assemble,
and then the assembled instruction is sent to our decoder. From this,
we can then verify that the output of the decoder matches what was
generated earlier.

We also explored using a similar idea to test the functionality of the
entire SOC. By using the [QEMU](https://www.qemu.org/) PowerPC
emulator, we can compare the execution of our SOC against that of the
emulator to verify that our decoder and backend are working correctly.
We would write snippets of test code (or potentially randomly generate
instructions) and send the resulting binary to both the SOC and
QEMU. We would then simulate our SOC until it was finished executing
instructions, and use Qemu's gdb interface to do the same. We would
then use Qemu's gdb interface to compare the register file and memory
with that of our SOC to verify that it is working correctly. I did
some experimentation using this technique to verify a [rudimentary
simulator](https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/simulator/test_sim.py;h=aadaf667eff7317b1aa514993cd82b9abedf1047;hb=433ab59cf9b7ab1ae10754798fc1c110e705db76)
of the SOC backend, and it seemed to work quite well.

*(Note from Luke: this automated approach, taking either other people's
regularly-written code or actual PDF specifications, not only saves us a
vast amount of time, it also ensures that our implementation is
correct and does not contain transcription errors).*

# simple-soft-float Library and POWER FP emulation

The [simple-soft-float](https://salsa.debian.org/Kazan-team/simple-soft-float)
library is a floating-point library Jacob wrote with the intention
of being a reference implementation of IEEE 754 for hardware testing
purposes. It's specifically designed to be written to be easier to
understand instead of having the code obscured in pursuit of speed:

* Being easier to understand helps prevent bugs where the code does not
  match the IEEE spec.
* It uses the [algebraics](https://salsa.debian.org/Kazan-team/algebraics)
  library that Jacob wrote since that allows using numbers that behave
  like exact real numbers, making reasoning about the code simpler.
* It is written in Rust rather than highly-macro-ified C, since that helps with
  readability since operations aren't obscured, as well as safety, since Rust
  proves at compile time that the code won't seg-fault unless you specifically
  opt-out of those guarantees by using `unsafe`.

It currently supports 16, 32, 64, 128-bit FP for RISC-V, along with
having a `DynamicFloat` type which allows dynamically specifying all
aspects of how a particular floating-point type behaves -- if one wanted,
they could configure it as a 2048-bit floating-point type.

It also has Python bindings, thanks to the awesome
[PyO3](https://pyo3.rs/) library for writing Python bindings in Rust.

We decided to write simple-soft-float instead
of extending the industry-standard [Berkeley
softfloat](http://www.jhauser.us/arithmetic/SoftFloat.html) library
because of a range of issues, including not supporting Power FP, requiring
recompilation to switch which ISA is being emulated, not supporting
all the required operations, architectural issues such as depending on
global variables, etc. We are still testing simple-soft-float against
Berkeley softfloat where we can, however, since Berkeley softfloat is
widely used and highly likely to be correct.

simple-soft-float is [gaining support for Power
FP](http://bugs.libre-riscv.org/show_bug.cgi?id=258), which requires
rewriting a lot of the status-flag handling code since Power supports a
much larger set of floating-point status flags and exceptions than most
other ISAs.

Thanks to Raptor CS for giving us remote access to a Power9 system,
since that makes it much easier verifying that the test cases are correct
(more on this below).

API Docs for stable releases of both
[simple-soft-float](https://docs.rs/simple-soft-float) and
[algebraics](https://docs.rs/algebraics) are available on docs.rs.

The algebraics library was chosen as the
[Crate of the Week for October 8, 2019 on This Week in
Rust](https://this-week-in-rust.org/blog/2019/10/08/this-week-in-rust-307/#crate-of-the-week).

One of the really important things about these libraries: they're not
specifically coded exclusively for Libre-SOC: like Berkeley softfloat itself
(and also like the [IEEE754 FPU](https://git.libre-riscv.org/?p=ieee754fpu.git))
they're intended for *general-purpose* use by other projects.  These are
exactly the kinds of side-benefits for the wider Libre community that
sponsorship, from individuals, Foundations (such as NLNet) and Companies
(such as Purism and Raptor CS) brings.

# Kazan Getting a New Shader Compiler IR

After spending several weeks only to discover that translating directly from
SPIR-V to LLVM IR, Vectorizing, and all the other front-end stuff all in a
single step is not really feasible, Jacob has switched to [creating a new
shader compiler IR](http://bugs.libre-riscv.org/show_bug.cgi?id=161) to allow
decomposing the translation process into several smaller steps.

The IR and
SPIR-V to IR translator are being written simultaneously, since that allows
more easily finding the things that need to be represented in the shader
compiler IR. Because writing both of the IR and SPIR-V translator together is
such a big task, we decided to pick an arbitrary point ([translating a totally
trivial shader into the IR](http://bugs.libre-riscv.org/show_bug.cgi?id=177))
and split it into tasks at that point so Jacob would be able to get paid
after several months of work.

The IR uses structured control-flow inspired by WebAssembly's control-flow
constructs as well as
[SSA](https://en.wikipedia.org/wiki/Static_single_assignment_form) but, instead
of using traditional phi instructions, it uses block and loop parameters and
return values (inspired by [Cranelift's EBB
parameters](https://github.com/bytecodealliance/wasmtime/blob/master/cranelift/docs/ir.md#static-single-assignment-form)
as well as both of the [Rust](https://www.rust-lang.org/) and [Lua](https://www.lua.org/) programming languages).

The IR has a single pointer type for all data pointers (`data_ptr`), unlike LLVM IR where pointer types have a type they point to (like `* i32`, where `i32` is the type the pointer points to).

Because having a serialized form of the IR is important for any good IR, like
LLVM IR, it has a user-friendly textual form that can be both read and
written without losing any information (assuming the IR is valid, comments are
ignored). A binary form may be added later.

Some example code (the IR is likely to change somewhat):

```
# this is a comment, comments go from the `#` character
# to the end of the line.

fn function1[] -> ! {
    # declares a function named function1 that takes
    # zero parameters and doesn't return
    # (the return type is !, taken from Rust).
    # If the function could return, there would instead be
    # a list of return types:
    # fn my_fn[] -> [i32, i64] {...}
    # my_fn returns an i32 and an i64. The multiple
    # returned values is inspired by Lua's multiple return values.

    # the hints for this function
    hints {
        # there are no inlining hints for this function
        inlining_hint: none,
        # this function doesn't have a side-effect hint
        side_effects: normal,
    }

    # function local variables
    {
        # the local variable is an i32 with an
        # alignment of 4 bytes
        i32, align: 0x4 -> local_var1: data_ptr;
        # the pointer to the local variable is
        # assigned to local_var1 which has the type data_ptr
    }

    # the function body is a single block -- block1.
    # block1's return types are instead attached to the
    # function signature above
    # (the `-> !` in the `fn function1[] -> !`).
    block1 {
        # the first instruction is a loop named loop1.
        # the initial value of loop_var is the_const,
        # which is a named constant.
        # the value of the_const is the address of the
        # function `function1`.
        loop loop1[the_const: fn function1] -> ! {
            # loop1 takes 1 parameter, which is assigned
            # to loop_var. the type of loop_var is a pointer to a
            # function which takes no parameters and doesn't
            # return.
            -> [loop_var: fn[] -> !];

            # the loop body is a single block -- block2.
            # block2's return value definitions are instead
            # attached to the loop instruction above
            # (the `-> !` in the `loop loop1[...] -> !`).
            block2 {

                # block3 is a block instruction, it returns
                # two values, which are assigned to a and b.
                # Both of a and b have type i32.
                block block3 -> [a: i32, b: i32] {
                    # the only way a block can return is by
                    # being broken out of using the break
                    # instruction. It is invalid for execution
                    # to reach the end of a block.

                    # this break instruction breaks out of
                    # block3, making block3 return the
                    # constants 1 and 2, both of type i32.
                    break block3[1i32, 2i32];
                };

                # an add instruction. The instruction adds
                # the value `a` (returned by block3 above) to
                # the constant `increment` (which is an i32
                # with the value 0x1), and stores the
                # result in the value `"a"1`. The source-code
                # location for the add instruction is specified
                # as being line 12, column 34, in the file
                # `source_file.vertex`.
                add [a, increment: 0x1i32]
                    -> ["a"1: i32] @ "source_file.vertex":12:34;

                # The `"a"1` name is stored as just `a` in
                # the IR, where the 1 is a numerical name
                # suffix to differentiate between the two
                # values with name `a`. This allows robustly
                # handling duplicate names, by using the
                # numerical name suffix to disambiguate.
                #
                # If a name is specified without the numerical
                # name suffix, the suffix is assumed to be the
                # number 0. This also allows handling names that
                # have unusual characters or are just the empty
                # string by using the form with the numerical
                # suffix:
                # `""0` (empty string)
                # `"\n"0` (a newline)
                # `"\u{12345}"0` (the unicode scalar value 0x12345)


                # this continue instruction jumps back to
                # the beginning of loop1, supplying the new
                # values of the loop parameters. In this case,
                # we just supply loop_var as the value for
                # the parameter, which just gets assigned to
                # loop_var in the next iteration.
                continue loop1[loop_var];
            }
        };
    }
}
```

# OpenPOWER Conference calls

We've now established a routine two-week conference call with Hugh Blemings,
OpenPOWER Foundation Director, and Timothy Pearson, CEO of Raptor CS.  This
allows us to keep up-to-date (each way) on both our new venture and also
the newly-announced OpenPOWER Foundation effort as it progresses.

One of the most important things that we, Libre-SOC, need, and are
discussing with Hugh and Tim is: a way to switch on/off functionality
in the (limited) 32-bit opcode space, so that we have one mode for
"POWER 3.0B compliance" and another for "things that are absolutely
essential to make a decent GPU".  With these two being strongly
mutually exclusively incompatible, this is just absolutely critical.

Khronos Vulkan Floating-point Compliance is, for example, critical not
just from a Khronos Trademark Compliance perspective, it's essential
from a power-saving and thus commercial success perspective.  If we
have absolute strict compliance with IEEE754 for POWER 3.0B, this will
result in far more silicon than any commercially-competitive GPU on
the market, and we will not be able to sell product.  Thus it is
*commercially* essential to be able to swap between POWER Compliance
and Khronos Compliance *at the silicon level*.

POWER 3.0B does not have c++ style LR/SC atomic operations for example,
and if we have half a **million** 3D GPU data structures **per second**
that need SMP-level inter-core mutexes, and the current POWER 3.0B
multi-instruction atomic operations are used - conforming strictly to
the standard - we're highly likely to use 10 to 15 **percent** processing
power consumed on spin-locking.  Finding out from Tim on one of these
calls that this is something that c++ atomics is something that end-users
have been asking about is therefore a good sign.

Adding new and essential features that could well end up in a future version
of the POWER ISA *need* to be firewalled in a clean way, and we've been
asked to [draft a letter](https://libre-riscv.org/openpower/isans_letter/)
to some of the (very busy) engineers with a huge amount of knowledge
and experience inside IBM, for them to consider.  Some help in reviewing
it would be greatly appreciated.

These and many other things are why the calls with Tim and Hugh are a
good idea.  The amazing thing is that they're taking us seriously, and
we can discuss things like those above with them.

Other nice things we learned (more on this below) is that Epic Games
and RaptorCS are collaborating to get POWER9 supported in Unreal Engine.
And that the idea has been very tentatively considered to use our design
for the "boot management" processor, running
[OpenBMC](https://github.com/openbmc/openbmc).  These are early days,
it's just ideas, ok!  Aside from anything, we actually have to get a chip
done, first.

# OpenPower Virtual Coffee Meetings

The "Virtual Coffee Meetings", announced
[here](https://openpowerfoundation.org/openpower-virtual-coffee-calls/)
are literally open to anyone interested in OpenPOWER (if you're strictly
Libre there's a dial-in method).  These calls are not recorded, it's
just an informal conversation.

What's a really nice surprise is finding
out that Paul Mackerras, whom I used to work with 20 years ago, is *also*
working on OpenPOWER, specifically
[microwatt](https://github.com/antonblanchard/microwatt), being managed
by Anton Blanchard.

A brief discussion led to learning that Paul is looking at adding TLB
(Virtual Memory) support to microwatt, specifically the RADIX TLB.
I therefore pointed him at the same resource
[(power-gem5)](https://github.com/power-gem5/gem5/tree/gem5-experimental)
that Hugh had kindly pointed me at, the week before, and did a
[late night write-up](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005445.html)

My feeling is that these weekly round-table meetings are going to be
really important for everyone involved in OpenPOWER.  It's a community:
we help each other.

# Sponsorship by RaptorCS with a TALOS II Workstation

With many thanks to Timothy from
[RaptorCS](https://raptorcs.com), we've a new shiny
online server that needs
[setting up](http://bugs.libre-riscv.org/show_bug.cgi?id=265).
This machine is not just a "nice-to-have", it's actually essential for
us to be able to verify against.  As you can see in the bugreport, the idea
is to bootstrap our way from running IEEE754 FP on a *POWER* system
(using typically gnu libm), verifying Jacob's algorithmic FP library
particularly and specifically for its rounding modes and exception modes.

Once that is done, then apart from having a general-purpose library that
is compliant with POWER IEEE754 which *anyone else can use*, we can use
that to run unit tests against our[
hardware IEEE754 FP library](https://git.libre-riscv.org/?p=ieee754fpu.git;a=summary) -
again, a resource that anyone may use in any arbitrary project - verifying
that it is also correct.  This stepping-stone "bootstrap" method we are
deploying all over the place, however to do so we need access to resources
that have correctly-compliant implementations in the first place.  Thus,
the critical importance of access to a TALOS II POWER9 workstation.

# Epic Megagrants

Several months back I got word of the existence of Epic Games' "Megagrants".
In December 2019 they announced that so far they've given
[USD $13 million](https://www.unrealengine.com/en-US/blog/epic-megagrants-reaches-13-million-milestone-in-2019)
to 200 recipients, so far: one of them, the Blender Foundation, was
[USD $1.2 million](https://www.blender.org/press/epic-games-supports-blender-foundation-with-1-2-million-epic-megagrant/)!
This is an amazing and humbling show of support for the 3D Community,
world-wide.

It's not just "games", or products specifically using the Unreal Engine:
they're happy to look at anything that "enhances Libre / Open source"
capabilities for the 3D Graphics Community.

A full hybrid 3D-capable CPU-GPU-VPU which is fully-documented not just in
its capabilities, that [documentation](http://libre-riscv.org) and
[full source code](http://git.libre-riscv.org) kinda extends
right the way through the *entire development process* down to the bedrock
of the actual silicon - not just the firmware, bootloader and BIOS,
*everything* - in my mind it kinda qualifies in way that can, in some
delightful way, be characterised delicately as "complete overkill".

Interestingly, guys, if you're reading this: Tim, the CEO of RaptorCS
informs us that you're working closely with his team to get the Unreal
Engine up and running on the POWER architecture?  Wouldn't that be highly
amusing, for us to be able to run the Unreal Engine on the Libre-SOC,
given that it's going to be POWER compatible hardware, as a test,
first initially in FPGA and then in 18-24 months, on actual silicon, eh?

So, as I mentioned
[on the list](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005262.html)
(reiterating what I put in the original application), we're happy with
USD $25,000, we're happy with USD $10 million.  It's really up to you guys,
at Epic Games, as to what level you'd like to see us get to, and how fast.

USD $600,000 for example we can instead of paying USD $1million to a proprietary
company to license a DDR3 PHY for a limited one-time use and only a 32-bit
wide interface, we can contract SymbioticEDA to *design* a DDR3 PHY for us,
which both we *and the rest of the worldwide Silicon Community can use
without limitation* because we will ask SymbioticEDA to make the design
(and layout) libre-licensed, for anyone to use.

USD 250,000 pays for the mask charges that will allow us to do the 40nm
quad-core ASIC that we have on the roadmap for the second chip. USD
$1m pays for 28nm masks (and so on, in an exponential ramp-up).  No, we
don't want to do that straight away: yes we do want to go through a first
proving test ASIC in 180nm, which, thanks to NLNet, is already funded.
This is just good sane sensible use of funds.

Even USD $25,000 helps us to cover things such as administration of the
website (which is taking up a *lot* of time) and little things that we
didn't quite foresee when putting in the NLNet Grant Applications.

Lastly, one of the conditions as I understood it from the Megagrants
process is that the funds are paid in "stages".  This is exactly
what NLNet does for (and with) us, right now.  If you wanted to save
administrative costs, there may be some benefit to having a conversation
with the [30-year-old](https://nlnet.nl/foundation/history/)
NLNet Charitable Foundation.  Something to think about?

# NLNet Milestone tasks

Part of applying for NLNet's Grants is a requirement to create a list
of tasks, each of which is assigned a budget.  On 100% completion of the task,
donations can be sent out.  With *six* new proposals accepted, each of which
required between five (minimum) and *ninteen* separate and distinct tasks,
a call with Michiel and Joost turned into an unexpected three hour online
marathon, scrambling to write almost fifty bugreports as part of the Schedule
to be attached to each Memorandum of Understanding.  The mailing list
got a [leeetle bit busy](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005003.html)
right around here.

Which emphasised for us the important need to subdivide the mailing list into
separate lists (below).

# Georgia Tech CREATE-X

(*This section kindly written by Yehowshua*)

Yehowshua is a student at Georgia Tech currently pursuing a Masters in
Computer Engineering - to graduate this summer. He had started working
on LibreSOC in December and wanted to to get LibreSOC more funding so
I could work on it full time.

He originally asked if the ECE Chair at Georgia Tech would be willing
to fund an in-department effort to deliver an SOC in collaboration
with LibreSOC(an idea to which he was quite receptive). Through Luke,
Yehowshua got in contact with Chistopher Klaus who suggested Yehowshua
should look into Klaus's startup accelerator program Create-X and perhaps
consider taking LibreSOC down the startup route.  Robert Rhinehart, who
had funded LibreSOC a little in the past (*note from Luke: he donated
the ZC706 and also funded modernisation of Richard Herveille's excellent
[vga_lcd](https://github.com/RoaLogic/vga_lcd) Library*)
also suggested that Yehowshua
incorporate LibreSOC with help from Create-X and said he would be willing
to be a seed investor. All this happened by February.

As of March, Yehowshua has been talking with Robert about what type of
customers would be interested in LibreSOC. Robert is largely interested in
biological applications. Yehowshua also had a couple meetings with Rahul
from Create-X. Yehowshua has started the incorporation of LibreSOC. The
parent company will probably be called Systèmes-Libres with LibreSOC
simply being one of the products we will offer. Yehowshua also attended
HPCA in late February and had mentioned LIbreSOC during his talk. People
seemed to find the idea quite interesting

He will later be speaking with some well know startup lawyers that have
an HQ in Atlanta to discuss business related things such as S Corps,
C corps, taxes, wages, equity etc…

Yehowshua plans for Systèmes-Libres to hire full time employees. Part
time work on Libre-SOC will still be possible through donations and
support from NL Net and companies like purism.

Currently, Yehowshua plans to take the Create-X summer launch program
and fund Systèmes-Libres by August. Full time wages would probably be
set around 100k USD.

# LOAD/STORE Buffer and 6600 design documentation

A critical part of this project is not just to create a chip, it's to
*document* the chip design, the decisions along the way, for both
educational, research, and ongoing maintenance purposes.  With an
augmented CDC 6600 design being chosen as the fundamental basis,
[documenting that](https://libre-riscv.org/3d_gpu/architecture/6600scoreboard/)
as well as the key differences is particularly important.  At the very least,
the extremely simple and highly effective hardware but timing-critical
design aspects of the circular loops in the 6600 were recognised by James
Thornton (the co-designer of the 6600) as being paradoxically challenging
to understand why so few gates could be so effective (being as they were,
literally the world's first ever out-of-order superscalar architecture).
Consequently, documenting it just to be able to *develop* it is extremely
important.

We're getting to the point where we need to connect the LOAD/STORE Computation
Units up to an actual memory architecture.  We've chosen
[minerva](https://github.com/lambdaconcept/minerva/blob/master/minerva/units/loadstore.py)
as the basis because it is written in nmigen, works, and, crucially, uses
wishbone (which we decided to use as the main Bus Backbone a few months ago).

However, unlike minerva, which is a single-issue 32-bit embedded chip,
where it's perfectly ok to have one single LD/ST operation per clock,
and not only that but to have that operation take a few clock cycles,
to get anything like the level of performance needed of a GPU, we need
at least four 64-bit LOADs or STOREs *every clock cycle*.

For a first ASIC from a team that's never done a chip before, this is,
officially, "Bonkers Territory".  Where minerva is doing 32-bit-wide
Buses (and does not support 64-bit LD/ST at all), we need internal
data buses of a minimum whopping **2000** wires wide.

Let that sink in for a moment.

The reason why the internal buses need to be 2000 wires wide comes down
to the fact that we need, realistically, 6 to eight LOAD/STORE Computation
Units.  4 of them will be operational, 2 to 4 of them will be waiting
with pending instructions from the multi-issue Vectorisation Engine.

We chose to use a system which expands the first 4 bits of the address,
plus the operation width (1,2,4,8 bytes) into a "bitmap" - a byte-mask -
that corresponds directly with the 16 byte "cache line" byte enable
columns, in the L1 Cache.  These bitmaps can then be "merged" such
that requests that go to the same cache line can be served *in the
same clock cycle* to multiple LOAD/STORE Computation Units.  This
being absolutely critical for effective Vector Processing.

Additionally, in order to deal with misaligned memory requests, each of those
needs to put out *two* such 16-byte-wide requests (see where this is going?)
out to the L1 Cache.
So, we now have eight times two times 128 bits which is a staggering
2048 wires *just for the data*.  There do exist ways to get that down
(potentially to half), and there do exist ways to get that cut in half
again, however doing so would miss opportunities for merging of requests
into cache lines.

At that point, thanks to Mitch Alsup's input (Mitch is the designer of
the Motorola 68000, Motorola 88120, key architecture on AMD's Opteron
Series, the AMD K9, AMDGPU and Samsung's latest GPU), we learned that
L1 cache design critically depends on what type of SRAM you have.  We
initially, naively, wanted dual-ported L1 SRAM and that's when Staf
and Mitch taught us that this results in half-duty rate.  Only
1-Read **or** 1-Write SRAM Cells give you fast enough (single-cycle)
data rates to be useable for L1 Caches.

Part of the conversation has wandered into
[why we chose dynamic pipelines](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005459.html)
as well as receiving that
[important advice](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005354.html)
from both Mitch Alsup and Staf Verhaegen.

(Staf is also [sponsored by NLNet](https://nlnet.nl/project/Chips4Makers/)
to create Libre-licensed Cell Libraries, busting through one of the -
many - layers of NDAs and reducing NREs and unnecessary and artificial
barriers for ASIC development: I helped him put in the submission, and
he was really happy to do the Cell Libraries that we will be using for
LibreSOC's 180nm test tape-out in October 2020.)

# Public-Inbox and Domain Migration

As mentioned before, one of the important aspects of this project is
the documentation and archiving.  It also turns out that when working
over an extremely unreliable or ultra-expensive mobile broadband link,
having *local* (offline) access to every available development resource
is critically important.

Hence why we are going to the trouble of installing public-inbox, due
to its ability to not only have a mailing list entirely stored in a
git repository, the "web service" which provides access to that git-backed
archive can be not only mirrored elsewhere, it can be *run locally on
your own local machine* even when offline.  This in combination
with the right mailer setup can store-and-forward any replies to the
(offline-copied) messages, such that they can be sent when internet
connectivity is restored, yet remain a productive collaborative developer.

Now you know why we absolutely do not accept "slack", or other proprietary
"online oh-so-convenient" service.  Not only is it highly inappropriate for
Libre Projects, not only do we become critically dependent on the Corporation
running the service (yes, github has been entirely offline, several times),
if we have remote developers (such as myself, working from Scotland last
month with sporadic access to a single Cell Tower) or developers in emerging
markets where their only internet access is via a Library or Internet Cafe,
we absolutely do not want to exclude or penalise such people, just because
they have less resources.

Fascinatingly, Linus Torvals is *specifically*
[on record](https://www.linuxjournal.com/content/line-length-limits)
about making sure that "Linux development does not favour wealthy people".

We are also, as mentioned before, moving to a new domain name.  We'll take
the opportunity to fix some of the issues with HTTPS (wrong certificate),
and also do some
[better mailing list names](http://bugs.libre-riscv.org/show_bug.cgi?id=184)
at the same time.

TODO (Veera?) bit about what was actually done, how it links into mailman2.

# OpenPOWER HDL Mailing List opens up

It is early days, however it is fantastic to see responses from IBM with
regards to requests for access to the POWER ISA Specification
documents in
[machine-readable form](http://lists.mailinglist.openpowerfoundation.org/pipermail/openpower-hdl-cores/2020-March/000007.html)
I took Jeff at his word and explained, in some detail,
[exactly why](http://lists.mailinglist.openpowerfoundation.org/pipermail/openpower-hdl-cores/2020-March/000008.html)
machine readable versions of specifications are critically important.

The takeaway is: *we haven't got time to do manual transliteration of the spec*
into "code".  We're expending considerable effort making sure that we
"bounce" or "bootstrap" off of pre-existing resources, using computer
programs to do so.

This "trick" is something that I learned over 20 years ago, when developing
an SMB Client and Server in something like two weeks flat.  I wrote a
parser which read the packet formats *from the IETF Draft Specification*,
and outputted c-code.

This leaves me wondering, as I mention on the HDL list, if we can do the same
thing with large sections of the POWER Spec.

# Build Servers

TODO

# Conclusion

I'm not going to mention anything about the current world climate: you've
seen enough news reports.  I will say (more about this through the
[EOMA68](https://www.crowdsupply.com/eoma68/micro-desktop) updates) that
I anticipated something like what is happening right now, over ten years
ago.  I wasn't precisely expecting what *has* happened, just the consequences:
world-wide travel shut-down, and for people - the world over - to return to
local community roots.

However what I definitely wasn't expecting was a United States President
to be voted in who was eager and, frankly, stupid enough, to start *and
escalate* a Trade war with China.  The impact on the U.S economy alone, and the
reputation of the whole country, has been detrimental in the extreme.

This combination leaves us - world-wide - with the strong possibility that
seemed so "preposterous" that I could in no way discuss it widely, let alone
mention it on something like a Crowdsupply update, that thanks to the
business model on which their entire product lifecycle is predicated,
in combination with the extremely high NREs and development costs for
ASICs (custom silicon costs USD $100 million, these days), several
large Corporations producing proprietary binary-only drivers for
hardware on which we critically rely for our internet-connected way
of life **may soon go out of business**.

Right at a critical time where video conferencing is taking off massively,
your proprietary hardware - your smartphone, your tablet, your laptop,
everything you rely on for connectivity to the rest of the world, all of
a sudden **you may not be able to get software updates** or, worse,
your products could even be
[remotely shut down](https://www.theguardian.com/technology/2016/apr/05/revolv-devices-bricked-google-nest-smart-home)
**without warning**.

I do not want to hammer the point home too strongly but you should be
getting, in no uncertain terms, exactly how strategically critical, in
the current world climate, this project just became.  We need to get it
accelerated, completed, and into production, in an expedited and responsible
fashion.