From: Luke Kenneth Casson Leighton Date: Tue, 10 Apr 2018 15:19:58 +0000 (+0100) Subject: add predication section X-Git-Tag: convert-csv-opcode-to-binary~5711 X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=d4c0c12a9eb5ba05656eee55b2c35af5ad32290e;p=libreriscv.git add predication section --- diff --git a/simple_v_extension.mdwn b/simple_v_extension.mdwn index 65b69b057..bcd07e40d 100644 --- a/simple_v_extension.mdwn +++ b/simple_v_extension.mdwn @@ -64,6 +64,21 @@ of not being widely adopted. I'm inclined towards recommending: **TODO**: propose "mask" (predication) registers likewise. combination with standard RV instructions and overflow registers extremely powerful +## CSRs marking registers as Vector + +A 32-bit CSR would be needed (1 bit per integer register) to indicate +whether a register was, if referred to, implicitly to be treated as +a vector. + +A second 32-bit CSR would be needed (1 bit per floating-point register) +to indicate whether a floating-point register was to be treated as a +vector. + +In this way any standard (current or future) operation involving +register operands may detect if the operation is to be vector-vector, +vector-scalar or scalar-scalar (standard) simply through a single +bit test. + ## CSR vector-length and CSR SIMD packed-bitwidth **TODO** analyse each of these: @@ -105,6 +120,9 @@ and for the former it would simply be: > bitwidth = CSR-Vector_bitwidth[M] > vectorlen = CSR-Vector_len[M] +Alternatives: + +* One single "global" vector-length CSR ## Stride @@ -238,14 +256,66 @@ to keep ALU pipelines 100% occupied. This very simple proposal offers a way to increase pipeline activity in the one key area which really matters: the inner loop. -## Mask and Tagging - -*TODO: research masks as they can be superb and extremely powerful. -If B-Extension is implemented and provides Bit-Gather-Scatter it -becomes really cool and easy to switch out certain indexed values -from an array of data, but actually BGS **on its own** might be -sufficient. Bottom line, this is complex, and needs a proper analysis. -The other sections are pretty straightforward.* +## Mask and Tagging (Predication) + +Tagging (aka Masks aka Predication) is a pseudo-method of implementing +simplistic branching in a parallel fashion, by allowing execution on +elements of a vector to be switched on or off depending on the results +of prior operations in the same array position. + +The reason for considering this is simple: by *definition* it +is not possible to perform individual parallel branches in a SIMD +(Single-Instruction, **Multiple**-Data) context. Branches (modifying +of the Program Counter) will result in *all* parallel data having +a different instruction executed on it: that's just the definition of +SIMD, and it is simply unavoidable. + +So these are the ways in which conditional execution may be implemented: + +* explicit compare and branch: BNE x, y -> offs would jump offs + instructions if x was not equal to y +* explicit store of tag condition: CMP x, y -> tagbit +* implicit (condition-code) ADD results in a carry, carry bit implicitly + (or sometimes explicitly) goes into a "tag" (mask) register + +The first of these is a "normal" branch method, which is flat-out impossible +to parallelise without look-ahead and effectively rewriting instructions. +This would defeat the purpose of RISC. + +The latter two are where parallelism becomes easy to do without complexity: +every operation is modified to be "conditionally executed" (in an explicit +way directly in the instruction format *or* implicitly). + +RVV (Vector-Extension) proposes to have *explicit* storing of the compare +in a tag/mask register, and to *explicitly* have every vector operation +*require* that its operation be "predicated" on the bits within an +explicitly-named tag/mask register. + +SIMD (P-Extension) has not yet published precise documentation on what its +schema is to be: there is however verbal indication at the time of writing +that: + +> The "compare" instructions in the DSP/SIMD ISA proposed by Andes will +> be executed using the same compare ALU logic for the base ISA with some +> minor modifications to handle smaller data types. The function will not +> be duplicated. + +This is an *implicit* form of predication as the base RV ISA does not have +condition-codes or predication. By adding a CSR it becomes possible +to also tag certain registers as "predicated if referenced as a destination". +Example: + +> # in future operations if r0 is the destination use r5 as +> # the PREDICATION register +> IMPLICICSRPREDICATE r0, r5 +> # store the compares in r5 as the PREDICATION register +> CMPEQ8 r5, r1, r2 +> # r0 is used here. ah ha! that means it's predicated using r5! +> ADD8 r0, r1, r3 + +With enough registers (and there are enough registers) some fairly +complex predication can be set up and yet still execute without significant +stalling, even in a simple non-superscalar architecture. ## Conclusions @@ -258,7 +328,7 @@ follows: * Implicit (indirect) vs fixed (integral) instruction bit-width: indirect * Implicit vs explicit type-conversion: explicit * Implicit vs explicit inner loops: implicit -* Tag or no-tag: TODO +* Tag or no-tag: Complex and needs further thought In particular: variable-length vectors came out on top because of the high setup, teardown and corner-cases associated with the fixed width @@ -876,4 +946,4 @@ pluses: * Hwacha * Hwacha * Vector Workshop - +* Predication