X-Git-Url: https://git.libre-soc.org/?a=blobdiff_plain;f=openpower%2Fsv%2Fldst.mdwn;h=ca67a4f004ba1461da6fd59f5613fa0a9168961e;hb=0a873ae35b9633a01c0090a30d919cc8c979cdfc;hp=ff6159487a2e525a9b8b21f370212045f21b23fb;hpb=5cbd23c4a44726cdc8e4e29fe6f530401af84542;p=libreriscv.git diff --git a/openpower/sv/ldst.mdwn b/openpower/sv/ldst.mdwn index ff6159487..ca67a4f00 100644 --- a/openpower/sv/ldst.mdwn +++ b/openpower/sv/ldst.mdwn @@ -11,15 +11,44 @@ Links: * * [[simple_v_extension/specification/ld.x]] +# Rationale + +All Vector ISAs dating back fifty years have extensive and comprehensive +Load and Store operations that go far beyond the capabilities of Scalar +RISC or CISC processors, yet at their heart on an individual element +basis may be found to be no different from RISC Scalar equivalents. + +The resource savings from Vector LD/ST are significant and stem from +the fact that one single instruction can trigger a dozen (or in some +microarchitectures such as Cray or NEC SX Aurora) hundreds of element-level Memory accesses. + +Additionally, and simply: if the Arithmetic side of an ISA supports +Vector Operations, then in order to keep the ALUs 100% occupied the +Memory infrastructure (and the ISA itself) correspondingly needs Vector +Memory Operations as well. + +Vectorised Load and Store also presents an extra dimension (literally) +which creates scenarios unique to Vector applications, that a Scalar +(and even a SIMD) ISA simply never encounters. SVP64 endeavours to +add such modes without changing the behaviour of the underlying Base +(Scalar) v3.0B operations. + +# Modes overview + Vectorisation of Load and Store requires creation, from scalar operations, a number of different modes: * fixed stride (contiguous sequence with no gaps) aka "unit" stride * element strided (sequential but regularly offset, with gaps) * vector indexed (vector of base addresses and vector of offsets) -* fail-first on the same (where it makes sense to do so) +* Speculative fail-first (where it makes sense to do so) * Structure Packing (covered in SV by [[sv/remap]]). +Also included in SVP64 LD/ST is both signed and unsigned Saturation, +as well as Element-width overrides and Twin-Predication. + +# Vectorisation of Scalar Power ISA v3.0B + OpenPOWER Load/Store operations may be seen from [[isa/fixedload]] and [[isa/fixedstore]] pseudocode to be of the form: @@ -109,13 +138,17 @@ Indexed LD is: if (RAupdate.isvec) while (!(ps & 1< +LD/ST ffirst treats the first LD/ST in a vector (element 0) as an +ordinary one. Exceptions occur "as normal". However for elements 1 +and above, if an exception would occur, then VL is **truncated** to the +previous element: the exception is **not** then raised because the +LD/ST was effectively speculative. + +ffirst LD/ST to multiple pages via a Vectorised Index base is considered a security risk due to the abuse of probing multiple pages in rapid succession and getting feedback on which pages would fail. Therefore Vector Indexed LD/ST is prohibited entirely, and the Mode bit instead used for element-strided LD/ST. See for(i = 0; i < VL; i++) reg[rt + i] = mem[reg[ra] + i * reg[rb]]; @@ -244,6 +287,25 @@ to *always* set VL=1 which will have the effect of terminating any speculative probing (and also adversely affect performance), but will at least not require applications to be rewritten. +Low-performance simpler hardware implementations may +choose (always) to also set VL=1 as the bare minimum compliant implementation of +LD/ST Fail-First. It is however critically important to remember that +the first element LD/ST **MUST** be treated as an ordinary LD/ST, i.e. +**MUST** raise exceptions exactly like an ordinary LD/ST. + +For ffirst LD/STs, VL may be truncated arbitrarily to a nonzero value for any implementation-specific reason. For example: it is perfectly reasonable for implementations to alter VL when ffirst LD or ST operations are initiated on a nonaligned boundary, such that within a loop the subsequent iteration of that loop begins subsequent ffirst LD/ST operations on an aligned boundary +such as the beginning of a cache line, or beginning of a Virtual Memory +page. Likewise, to reduce workloads or balance resources. + +Vertical-First Mode is slightly strange in that only one element +at a time is ever executed anyway. Given that programmers may +legitimately choose to alter srcstep and dststep in non-sequential +order as part of explicit loops, it is neither possible nor +safe to make speculative assumptions about future LD/STs. +Therefore, Fail-First LD/ST in Vertical-First is `UNDEFINED`. +This is very different from Arithmetic (Data-dependent) FFirst +where Vertical-First Mode is deterministic, not speculative. + # LOAD/STORE Elwidths Loads and Stores are almost unique in that the OpenPOWER Scalar ISA