From c210ed731dbf08213f3396dfbb9f021679994d13 Mon Sep 17 00:00:00 2001
From: lkcl <lkcl@web>
Date: Fri, 19 Aug 2022 06:30:38 +0100
Subject: [PATCH]

---
 openpower/sv/ldst.mdwn | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/openpower/sv/ldst.mdwn b/openpower/sv/ldst.mdwn
index 0098bb781..99c768d06 100644
--- a/openpower/sv/ldst.mdwn
+++ b/openpower/sv/ldst.mdwn
@@ -138,9 +138,8 @@ destination. Just like Cache-inhibited LDs, multiple values may be
 written out in quick succession to a memory-mapped peripheral from
 sequentially-numbered registers.
 
-Note that there are no immediate versions of cache-inhibited LD/ST
-(no *Scalar* cache-inhibited immediate instructions to Vectorise).
-A future version of the Power ISA *may* have such Scalar instructions.
+Note that any memory location may be Cache-inhibited
+(Power ISA v.1, Book III, 1.6.1, p1033)
 
 **LD/ST Indexed**
 
@@ -178,8 +177,9 @@ set RB is sign-extended from elwidth bits to the full 64
 bits.  For other Modes (ffirst, saturate),
 all EA computation with elwidth overrides is unsigned.
 
-Note that cache-inhibited LD/ST (`ldcix`) when VSPLAT is activated will perform **multiple** LD/ST operations, sequentially.  `ldcix` even with scalar src will read the same memory location *multiple times*, storing the result in successive Vector destination registers.  This because the cache-inhibit instructions are used to read and write memory-mapped peripherals.
-If a genuine cache-inhibited LD-VSPLAT is required then a *scalar*
+Note that cache-inhibited LD/ST  when VSPLAT is activated will perform **multiple** LD/ST operations, sequentially.  Even with scalar src a
+Cache-inhibited LD will read the same memory location *multiple times*, storing the result in successive Vector destination registers.  This because the cache-inhibit instructions are typically used to read and write memory-mapped peripherals.
+If a genuine cache-inhibited LD-VSPLAT is required then a single *scalar*
 cache-inhibited LD should be performed, followed by a VSPLAT-augmented mv,
 copying the one *scalar* value into multiple register destinations.
 
@@ -187,11 +187,7 @@ Note also that cache-inhibited VSPLAT with Predicate-result is possible.
 This allows for example to issue a massive batch of memory-mapped
 peripheral reads, stopping at the first NULL-terminated character and
 truncating VL to that point. No branch is needed to issue that large burst
-of LDs.
-
-The multiple reads/writes to/from the same destination address is,
-in Vector-Indexed LD/ST, very similar to the relaxed constraints of
-mapreduce mode,
+of LDs, which may be valuable in Embedded scenarios.
 
 # Vectorisation of Scalar Power ISA v3.0B
 
-- 
2.30.2