From 84175a7cfc34beb917975c789a95831644b998d1 Mon Sep 17 00:00:00 2001
From: lkcl <lkcl@web>
Date: Sun, 18 Sep 2022 10:51:53 +0100
Subject: [PATCH]

---
 openpower/sv/rfc/ls001.mdwn | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/openpower/sv/rfc/ls001.mdwn b/openpower/sv/rfc/ls001.mdwn
index 6f02468e1..37acef3f7 100644
--- a/openpower/sv/rfc/ls001.mdwn
+++ b/openpower/sv/rfc/ls001.mdwn
@@ -344,14 +344,14 @@ mask `sv.bc/pm=r3`.
 
 Traditional Vector ISAs have vastly more (and more complex) addressing
 modes: unit strided, element strided, Indexed, Structure Packing. All
-of these had to be jammed in on top of **existing Scalar instructions
-without modifying the Scalar instructions**.  A small conceptual
+of these had to be jammed in on top of existing Scalar instructions
+**without modifying the Scalar instructions**.  A small conceptual
 "cheat" was therefore needed.  The Immediate (D) is in some Modes
 multiplied by the element index, which gives us element-strided.
 For unit-strided the width of the operation (`ld`, 8 byte) is taken
-as the multiplier.  Hardware-level modifications to support this
+as the multiplier.  Modifications to support this
 "cheat" on top of pre-existing Scalar HDL (and Simulators)
-have both turned out to be minimal.
+have both turned out to be minimal.[^mul]
 
 Also added was the option to perform signed or unsigned Effective
 Address calculation, which comes into play only on LD/ST Indexed,
@@ -374,7 +374,8 @@ Effective Address* in one instruction.
 For DCT and FFT, normally it is very expensive to perform the
 "bit-inversion" needed for address calculation and/or reordering
 of elements.  DCT in particular needs both bit-inversion *and
-Gray-Coding* offsets.  DCT/FFT REMAP **automatically** performs
+Gray-Coding* offsets (a complexity that often "justifies" full
+assmbler loop-unrolling).  DCT/FFT REMAP **automatically** performs
 the required offset adjustment to get data loaded and stored in
 the required order.  Matrix REMAP can likewise perform up to 3
 Dimensions of reordering (on both Immediate and Indexed), and
@@ -384,7 +385,8 @@ four dimensions (four nested fixed size loops).
 Overall the LD/ST Modes available are extremely powerful, especially
 when combining arithmetic (lharx) with saturation, element-width overrides,
 vec2/3/4 Structure Packing *and* REMAP, the combinations far exceed anything
-seen in any other Vector ISA in history.
+seen in any other Vector ISA in history, yet are really nothing more
+than concepts abstracted out in pure RISC form.[^ldstcisc]
 
 # SVP64Single 24-bits
 
@@ -1179,3 +1181,5 @@ operations.
 [^hphint]: intended for use when the compiler has determined the extent of Memory or register aliases in loops: `a[i] += a[i+4]` would necessitate a Vertical-First hphint of 4
 [^svshape]: although SVSHAPE0-3 should, realistically, be regarded as high a priority as SVSTATE, and given corresponding SVSRR and SVLR equivalents, it was felt that having to context-switch **five** SPRs on Interrupts and function calls was too much.
 [^whoops]: two efforts were made to mix non-uniform encodings into Simple-V space: one deliberate to see how it would go, and one accidental. They both went extremely badly, the deliberate one costing two months to add then remove.
+[^mul]: Setting this "multiplier"to 1 remarkably leaves pre-existing Scalar behaviour completely intact as a degenerate case.
+[ldstcisc]: At least the CISC "auto-increment" modes are not present, from the CDC 6600 and Motorola 68000! although these would be fun to introduce they do unfortunately make for 3-in 3-out register profiles, all 64-bit, which explains why the 6600 and 68000 had separate special dedicated address regfiles.
-- 
2.30.2