From: lkcl Date: Sun, 24 Jan 2021 21:41:13 +0000 (+0000) Subject: (no commit message) X-Git-Tag: convert-csv-opcode-to-binary~348 X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=f9dc9915b8017ce322f4e1a1359f7677cacad0aa;p=libreriscv.git --- diff --git a/openpower/sv/ldst.mdwn b/openpower/sv/ldst.mdwn index 20d5f45c0..e3cf4e79f 100644 --- a/openpower/sv/ldst.mdwn +++ b/openpower/sv/ldst.mdwn @@ -141,9 +141,26 @@ whether stride is unit or element: svctx.ldstmode = indexed elif els == 0: svctx.ldstmode = unitstride - else: + elif immediate != 0: svctx.ldstmode = elementstride +An immediate of zero is a safety-valve to allow `LD-VSPLAT`: +in effect the multiplication of the immediate-offset by zero results +in reading from the exact same memory location. + +For `LD-VSPLAT`, on non-cache-inhibited Loads, the read can occur +just the once and be copied, rather than hitting the Data Cache +multiple times with the same memory read at the same location. + +For ST from a vector source onto a scalar destination: with the Vector +loop effectively creating multiple memory writes to the same location, +we can deduce that the last of these will be the "successful" one. Thus, +implementations are free and clear to optimise out the overwriting STs, +leaving just the last one as the "winner". Bear in mind that predicate +masks will skip some elements (in source non-zeroing mode). + +Note that there are no immediate versions of cache-inhibited LD/ST. + The modes for `RA+RB` indexed version are slightly different: | 0-1 | 2 | 3 4 | description | @@ -160,7 +177,7 @@ A summary of the effect of Vectorisation of src or dest: imm(RA) RT.v RA.v no stride allowed imm(RA) RT.s RA.v no stride allowed - imm(RA) RT.v RA.s stride-select needed + imm(RA) RT.v RA.s stride-select allowed imm(RA) RT.s RA.s not vectorised RA,RB RT.v RA/RB.v ffirst banned RA,RB RT.s RA/RB.v ffirst banned @@ -168,7 +185,8 @@ A summary of the effect of Vectorisation of src or dest: RA,RB RT.s RA/RB.s not vectorised Note that cache-inhibited LD/ST (`ldcix`) when VSPLAT is activated will perform **multiple** LD/ST operations, sequentially. `ldcix` even with scalar src will read the same memory location *multiple times*, storing the result in successive Vector destination registers. This because the cache-inhibit instructions are used to read and write memory-mapped peripherals. -If a genuine VSPLAT is required then a scalar cache-inhibited LD should be performed, followed by a VSPLAT-augmented mv. +If a genuine cache-inhibited LD-VSPLAT is required then a *scalar* +cache-inhibited LD should be performed, followed by a VSPLAT-augmented mv. # LOAD/STORE Elwidths