From 57a06a1ba4cf96242512e7ede85389f34606ee15 Mon Sep 17 00:00:00 2001
From: lkcl <lkcl@web>
Date: Thu, 2 Jun 2022 12:18:21 +0100
Subject: [PATCH]

---
 openpower/sv/svp64_quirks.mdwn | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/openpower/sv/svp64_quirks.mdwn b/openpower/sv/svp64_quirks.mdwn
index b68c3168f..ad9beddc4 100644
--- a/openpower/sv/svp64_quirks.mdwn
+++ b/openpower/sv/svp64_quirks.mdwn
@@ -52,6 +52,8 @@ makes no sense at all, such as `sc` or `mtmsr`). The categories are:
 * Condition Register Field operations
 * branch
 
+**Arithmetic**
+
 Arithmetic (known as "normal" mode) is where Scalar and Parallel
 Reduction can be done: Saturation as well, and two new innovative
 modes for Vector ISAs: data-dependent fail-first and predicate result.
@@ -62,6 +64,8 @@ getting used to, as it may result in invalid results, but ultimately
 it is critical to think in terms of the "rules", that everything is
 Scalar instructions in strict Program Order.
 
+**Branches**
+
 Branch is the one and only place where the Scalar
 (non-prefixed) operations differ from the Vector (element)
 instructions, as explained in a separate section.
@@ -74,6 +78,8 @@ order to support a wide range of parallel boolean condition options
 which are expected of a Vector / GPU ISA. These save a considerable
 number of instructions in tight inner loop situations.
 
+**CR Field Ops**
+
 Condition Register Fields are 4-bit wide and consequently element-width
 overrides make absolutely no sense whatsoever. Therefore the elwidth
 override field bits can be used for other purposes when Vectorising
@@ -85,9 +91,27 @@ All of these differences, which require quite a lot of logical
 reasoning and deduction, help explain why there is an entirely different
 CR ops Vectorisation Category.
 
+**Load/Store**
+
 LOAD/STORE is another area that has different needs: this time it is
 down to limitations in Scalar LD/ST. Vector ISAs have Load/Store modes
-which simply make no sense in a RISC Scalar ISA: 
+which simply make no sense in a RISC Scalar ISA: element-stride and
+unit-stride and the entire concept of a stride itself (a spacing
+between elements) has no place at all in a Scalar ISA. The problems
+come when trying to *retrofit* the concept of "Vector Elements" onto
+a Scalar ISA, and it required a couple of bits (Modes) in the SVP64
+RM Prefix to convey the stride mode, changing the Effective Address
+computation as a result. Interestingly, worth noting for Hardware
+designers: it did turn out to be possible to perform pre-multiplication
+of the D/DS Immediate by the stride amount, making it possible to avoid
+actually modifying the LD/ST Pipelibe itself.
+
+Other areas where LD/ST went quirky: element-width overrides especially
+when combined with Saturation, given that LD/ST operations have byte,
+halfword, word, dword and quad variants. The interaction between these
+widths as part of the actual operation, and the source and destination
+elwidth overrides, was particularly obtuse and hard to derive: some care
+and attention is advised, here, when reading the specification.
 
 # CR weird instructions
 
-- 
2.30.2