From c210ed731dbf08213f3396dfbb9f021679994d13 Mon Sep 17 00:00:00 2001 From: lkcl Date: Fri, 19 Aug 2022 06:30:38 +0100 Subject: [PATCH] --- openpower/sv/ldst.mdwn | 16 ++++++---------- 1 file changed, 6 insertions(+), 10 deletions(-) diff --git a/openpower/sv/ldst.mdwn b/openpower/sv/ldst.mdwn index 0098bb781..99c768d06 100644 --- a/openpower/sv/ldst.mdwn +++ b/openpower/sv/ldst.mdwn @@ -138,9 +138,8 @@ destination. Just like Cache-inhibited LDs, multiple values may be written out in quick succession to a memory-mapped peripheral from sequentially-numbered registers. -Note that there are no immediate versions of cache-inhibited LD/ST -(no *Scalar* cache-inhibited immediate instructions to Vectorise). -A future version of the Power ISA *may* have such Scalar instructions. +Note that any memory location may be Cache-inhibited +(Power ISA v.1, Book III, 1.6.1, p1033) **LD/ST Indexed** @@ -178,8 +177,9 @@ set RB is sign-extended from elwidth bits to the full 64 bits. For other Modes (ffirst, saturate), all EA computation with elwidth overrides is unsigned. -Note that cache-inhibited LD/ST (`ldcix`) when VSPLAT is activated will perform **multiple** LD/ST operations, sequentially. `ldcix` even with scalar src will read the same memory location *multiple times*, storing the result in successive Vector destination registers. This because the cache-inhibit instructions are used to read and write memory-mapped peripherals. -If a genuine cache-inhibited LD-VSPLAT is required then a *scalar* +Note that cache-inhibited LD/ST when VSPLAT is activated will perform **multiple** LD/ST operations, sequentially. Even with scalar src a +Cache-inhibited LD will read the same memory location *multiple times*, storing the result in successive Vector destination registers. This because the cache-inhibit instructions are typically used to read and write memory-mapped peripherals. +If a genuine cache-inhibited LD-VSPLAT is required then a single *scalar* cache-inhibited LD should be performed, followed by a VSPLAT-augmented mv, copying the one *scalar* value into multiple register destinations. @@ -187,11 +187,7 @@ Note also that cache-inhibited VSPLAT with Predicate-result is possible. This allows for example to issue a massive batch of memory-mapped peripheral reads, stopping at the first NULL-terminated character and truncating VL to that point. No branch is needed to issue that large burst -of LDs. - -The multiple reads/writes to/from the same destination address is, -in Vector-Indexed LD/ST, very similar to the relaxed constraints of -mapreduce mode, +of LDs, which may be valuable in Embedded scenarios. # Vectorisation of Scalar Power ISA v3.0B -- 2.30.2