From 0670d69d34bcf34636f97a28000e548adbaa5525 Mon Sep 17 00:00:00 2001 From: Luke Kenneth Casson Leighton Date: Sun, 4 Nov 2018 09:12:18 +0000 Subject: [PATCH] add zero predication section --- simple_v_extension/specification.mdwn | 115 +++++++++++++++++++++++++- 1 file changed, 114 insertions(+), 1 deletion(-) diff --git a/simple_v_extension/specification.mdwn b/simple_v_extension/specification.mdwn index 02fbe5c37..cb21c6e67 100644 --- a/simple_v_extension/specification.mdwn +++ b/simple_v_extension/specification.mdwn @@ -1659,7 +1659,7 @@ The end result is that elements 0 and 1 end up in x8, with element 8 being shifted up 32 bits, and so on, until finally element 6 is in the LSBs of x11. -Note that whilst the memory addressing table is shown left-to-right byte order, +Note that whilst the memory addressing table is shown left-to-right byte order, the registers are shown in right-to-left (MSB) order. This does **not** imply that bit or byte-reversal is carried out: it's just easier to visualise memory as being contiguous bytes, and emphasises that registers are not @@ -1873,6 +1873,119 @@ Polymorphic variant: * add @ max(rs1, 12) bits * RD @ rd bits. sign-extend to rd if rd > max(rs1, 12) otherwise truncate +# Predication Element Zeroing + +The introduction of zeroing on traditional vector predication is usually +intended as an optimisation for lane-based microarchitectures with register +renaming to be able to save power by avoiding a register read on elements +that are passed through en-masse through the ALU. Simpler microarchitectures +do not have this issue: they simply do not pass the element through to +the ALU at all, and therefore do not store it back in the destination. +More complex non-lane-based micro-architectures can, when zeroing is +not set, use the predication bits to simply avoid sending element-based +operations to the ALUs, entirely: thus, over the long term, potentially +keeping all ALUs 100% occupied even when elements are predicated out. + +SimpleV's design principle is not based on or influenced by +microarchitectural design factors: it is a hardware-level API. +Therefore, looking purely at whether zeroing is *useful* or not, +(whether less instructions are needed for certain scenarios), +given that a case can be made for zeroing *and* non-zeroing, the +decision was taken to add support for both. + +Zeroing on predication for arithmetic operations is taken from +the destination register's predicate. i.e. the predication *and* +zeroing settings to be applied to the whole operation come from the +CSR Predication table entry for the destination register. +Thus when zeroing is set on predication of a destination element, +if the predication bit is clear, then the destination element is *set* +to zero (twin-predication is slightly different, and will be covered +next). + +Thus the pseudo-code loop for a predicated arithmetic operation +is modified to as follows: + +  for (i = 0; i < VL; i++) + if not zeroing: # an optimisation + while (!(predval & 1<