From: Luke Kenneth Casson Leighton Date: Sat, 5 Jan 2019 02:45:01 +0000 (+0000) Subject: add kazan update X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=e7c024b4562aa1a481a212db57191626ce7708ff;p=crowdsupply.git add kazan update --- diff --git a/updates/008_2018dec28_kazan.mdwn b/updates/008_2018dec28_kazan.mdwn index 6a9f576..795c119 100644 --- a/updates/008_2018dec28_kazan.mdwn +++ b/updates/008_2018dec28_kazan.mdwn @@ -1,13 +1,80 @@ -# TODO +# Kazan -intro. bit about hilariously not realising that spir-v was to be -*compiled* (to LLVM-IR)... +So after deciding to sponsor Jacob to work on a 3D Graphics Driver, +for some reason I thought using rust would be a good idea. Normally, +3D Graphics Drivers are written in c or c++ for performance reasons, +however in this case I was attracted to the security and memory-safety +inherent in rust. -# Jacob stuff +Hilariously, it wasn't until some time last week that the way that Vulkan +works actually sank in. I thought it was some sort of interpreter of +a 3D API, just like gallium3d: it most definitely is not. The core of +Vulkan is [SPIR-V](https://en.wikipedia.org/wiki/SPIR-V), an Intermediate +Representation (IR) language based on LLVM's IR. Originally developed +for OpenCL Parallel Compute, somewhere along the line someone realised +that SPIR-V would also do well for representing shaders in 3D applications. -todo +So whereas previously I was deeply concerned that I had made a huge mistake +in using rust, actually, the rust driver isn't so much a "driver" as it +is a **compiler**. As in: the purpose of a Vulkan implementation is to +**compile** the 3D shader SPIR-V binary provided by the 3D application +into something that will execute directly on the underlying hardware. -# finishing up +We chose to compile SPIR-V IR into LLVM IR, and for that task, the fact +that the compiler is written in rust does **not** affect performance +**in any way**. Once compiled to LLVM, the resultant IR will be handed +to the standard LLVM JIT (Just-in-Time) low-level compiler, and it will +execute **directly** as assembler, **and** it will execute in parallel, +as well. -todo +Contrast this with gallium3d-llvmpipe where the API is **interpreted** +(and also single-threaded). +# Example + +So I asked Jacob if he could do a quick write-up of an +[example translation](https://salsa.debian.org/Kazan-team/kazan/blob/master/docs/Example%20Translation%20from%20SPIR-V%20to%20LLVM%20IR.md). +I wanted to see what goes on, as I quite like compilers and language +translators. Also, a couple weeks back he ran into some roadblocks on +how the data structures would work in the compiler, so I figured it +would be nice to do a visual worked example. + +It looks pretty straightforward. Start at c-code, compile to SPIR-V +(the writer of the 3D or OpenCL application does that part). The interesting +bit is that SPIR-V kinda assumes a SIMD (or SIMT - which is basically +"predicated SIMD") micro-architecture. + +Unlike in a standard sequential algorithm, branches are not done as +"branches": they're done by testing a set of conditions (in parallel), +which produces a bit-field of 1s and 0s (representing success or +failure of each of the parallel compares), then the "THEN" part of the +statement - bear in mind this is all parallel - will be executed on each +element where its corresponding "predicate" bit is set to "1", and the +"ELSE" part of the statement will be executed where each bit is "0". + +Predication is not very popular outside of the parallel world, because +CPU cycles are "wasted" by having to send both the "THEN" *and* the "ELSE" +statements through to the execution unit. Remember, though, that in +the Libre-RISCV micro-architecture, as is described in the +update on predication, a zero predication bit results in that element +being **skipped**. Whilst it may sent to the ALU, once the predicate +bit is known, the operation is **cancelled** and the Function Units +may be allocated alternate resources. So, unlike more traditional +Vector and SIMT micro-architectures, our design does not suffer a performance +penalty due to predication. + +We do have a couple of issues to contend with, in LLVM. Firstly: whilst +this is a variable-length vectorisation micro-architecture, LLVM itself +does not yet support variable-length data structures. It's all based around +fixed SIMD. There is work underway to deal with that: we can adjust +accordingly as it happens. +Secondly: LLVM's IR support for predication is not as feature rich as +we would like: it's incomplete. + +However we have to start somewhere, and, as this is mostly software, there +is plenty of room to improve performance as time and resources allow. +Interestingly, AMD are planning some improvements to LLVM that will help +us out, here. The AMDGPU has similar polymorphic registers, so there are +plans to add in support for register "types" that have the ability to +span (use) more than one "hardware" register. This will be fascinating +to watch that unfold.