From b8e3ff2db3f1dfdb8f818ac7af5c83e26a0c7349 Mon Sep 17 00:00:00 2001
From: lkcl <lkcl@web>
Date: Tue, 14 Jun 2022 01:19:31 +0100
Subject: [PATCH]

---
 openpower/sv/mv.swizzle.mdwn | 30 ++++++++++++++++++------------
 1 file changed, 18 insertions(+), 12 deletions(-)

diff --git a/openpower/sv/mv.swizzle.mdwn b/openpower/sv/mv.swizzle.mdwn
index 2d7b39864..53e615ee4 100644
--- a/openpower/sv/mv.swizzle.mdwn
+++ b/openpower/sv/mv.swizzle.mdwn
@@ -17,6 +17,11 @@ A compromise is to provide a Swizzle "Move".
 The encoding for this instruction embeds static predication into the
 swizzle as well as constants 1/1.0 and 0/0.0
 
+An extremely important aspect of 3D GPU workloads is that the source
+and destination subvector lengths may be *different*.  A vector of
+contiguous array of vec3 may only 2 elements swizzle-copied to a contiguous
+array of vec2. Swizzle Moves support independent subvector lengths.
+
 Although conceptually similar to `vpermd` of Packed SIMD VSX,
 Swizzle Moves come in immediate-only form with only up to four
 selectors, where VSX refers to individual bytes and may not
@@ -44,11 +49,6 @@ to each:
 |pixel  |R   | G  | B | A  |
 |index  |0   | 1  | 2 | 3  |
 
-In very simplistic terms the relationship between swizzle indices,
-source, and destination is:
-
-    dest[i] = src[swiz[i]]
-
 The options for each Swizzle are:
 
 * 0b000 to indicate "skip".  this is equivalent to predicate masking
@@ -57,9 +57,14 @@ The options for each Swizzle are:
 * 0b011 to indicate "constant 1" (or 1.0)
 * 0b1NN index 0 thru 3 to copy from subelement in pos XYZW
 
-Note that 7 options are needed (not 6) because the 7th option allows static 
-predicate masking to be encoded within the swizzle immediate.
-For example this allows "W.Y." to specify: "copy W to position X,
+In very simplistic terms the relationship between swizzle indices
+(NN, above), source, and destination is:
+
+    dest[i] = src[swiz[i]]
+
+Note that 7 options are needed (not 6) because option 0b000 allows static 
+predicate masking (skipping) to be encoded within the swizzle immediate.
+For example it allows "W.Y." to specify: "copy W to position X,
 and Y to position Z, leave the other two positions Y and W unaltered"
 
     0    1    2    3
@@ -80,8 +85,8 @@ ISA this not practical. A compromise is to cut the registers required
 by half.
 When part of the Scalar Power ISA (not SVP64 Vectorised)
 mv.swiz and fmv.swiz operate on four 32-bit
-quantities, reducing this instruction to 2-in, 2-out pairs of 64-bit
-registers:
+quantities, reducing this instruction to a feasible
+2-in, 2-out pairs of 64-bit registers:
 
 | swizzle name | source | dest | half    |
 |--            | --     | --   | --      |
@@ -100,7 +105,7 @@ copy the contents RA+1 into RT, but set RT+1 to zero.
 
 Also, making life easier, RT and RA are only permitted to be even
 (no overlapping can occur).  This makes RT (and RA) a "pair" exactly
-like `lq` and `stq`.  Swizzle instructions must be atomically indivisible:
+as in `lq` and `stq`.  Swizzle instructions must be atomically indivisible:
 an Exception or Interrupt may not occur during the pair of Moves.
 
 **SVP64 Vectorised**
@@ -146,7 +151,8 @@ Swizzle are entirely optional in hardware at the Embedded Level.*
 
 Implementors must consider Swizzle instructions to be atomically indivisible,
 even if implemented as Micro-coded.  The rest of SVP64 permits element-level
-operations to be Precise-Interrupted: *Swizzle moves do not*.  All XYZW
+operations to be Precise-Interrupted: *Swizzle moves do not* because
+the multiple moves are part of the same insteuction.  All XYZW
 elements *must* be completed in full before any Trap or Interrupt is
 permitted
 to be serviced. Out-of-Order Micro-architectures may of course cancel
-- 
2.30.2