(no commit message)

author lkcl <lkcl@web>

Tue, 14 Jun 2022 00:19:31 +0000 (01:19 +0100)

committer IkiWiki <ikiwiki.info>

Tue, 14 Jun 2022 00:19:31 +0000 (01:19 +0100)
author lkcl <lkcl@web>
Tue, 14 Jun 2022 00:19:31 +0000 (01:19 +0100)
committer IkiWiki <ikiwiki.info>
Tue, 14 Jun 2022 00:19:31 +0000 (01:19 +0100)
diff --git a/openpower/sv/mv.swizzle.mdwn b/openpower/sv/mv.swizzle.mdwn

index 2d7b3986487282e76d9aaa49c9cf230c46411256..53e615ee4285fe9eecc2a4a1089a719e184e9532 100644 (file)
--- a/openpower/sv/mv.swizzle.mdwn
+++ b/openpower/sv/mv.swizzle.mdwn
@@ -17,6 +17,11 @@ A compromise is to provide a Swizzle "Move".
  The encoding for this instruction embeds static predication into the
  swizzle as well as constants 1/1.0 and 0/0.0
  
+An extremely important aspect of 3D GPU workloads is that the source
+and destination subvector lengths may be *different*.  A vector of
+contiguous array of vec3 may only 2 elements swizzle-copied to a contiguous
+array of vec2. Swizzle Moves support independent subvector lengths.
+
  Although conceptually similar to `vpermd` of Packed SIMD VSX,
  Swizzle Moves come in immediate-only form with only up to four
  selectors, where VSX refers to individual bytes and may not
@@ -44,11 +49,6 @@ to each:
  |pixel  |R   | G  | B | A  |
  |index  |0   | 1  | 2 | 3  |
  
-In very simplistic terms the relationship between swizzle indices,
-source, and destination is:
-
-    dest[i] = src[swiz[i]]
-
  The options for each Swizzle are:
  
  * 0b000 to indicate "skip".  this is equivalent to predicate masking
@@ -57,9 +57,14 @@ The options for each Swizzle are:
  * 0b011 to indicate "constant 1" (or 1.0)
  * 0b1NN index 0 thru 3 to copy from subelement in pos XYZW
  
-Note that 7 options are needed (not 6) because the 7th option allows static 
-predicate masking to be encoded within the swizzle immediate.
-For example this allows "W.Y." to specify: "copy W to position X,
+In very simplistic terms the relationship between swizzle indices
+(NN, above), source, and destination is:
+
+    dest[i] = src[swiz[i]]
+
+Note that 7 options are needed (not 6) because option 0b000 allows static 
+predicate masking (skipping) to be encoded within the swizzle immediate.
+For example it allows "W.Y." to specify: "copy W to position X,
  and Y to position Z, leave the other two positions Y and W unaltered"
  
      0    1    2    3
@@ -80,8 +85,8 @@ ISA this not practical. A compromise is to cut the registers required
  by half.
  When part of the Scalar Power ISA (not SVP64 Vectorised)
  mv.swiz and fmv.swiz operate on four 32-bit
-quantities, reducing this instruction to 2-in, 2-out pairs of 64-bit
-registers:
+quantities, reducing this instruction to a feasible
+2-in, 2-out pairs of 64-bit registers:
  
  | swizzle name | source | dest | half    |
  |--            | --     | --   | --      |
@@ -100,7 +105,7 @@ copy the contents RA+1 into RT, but set RT+1 to zero.
  
  Also, making life easier, RT and RA are only permitted to be even
  (no overlapping can occur).  This makes RT (and RA) a "pair" exactly
-like `lq` and `stq`.  Swizzle instructions must be atomically indivisible:
+as in `lq` and `stq`.  Swizzle instructions must be atomically indivisible:
  an Exception or Interrupt may not occur during the pair of Moves.
  
  **SVP64 Vectorised**
@@ -146,7 +151,8 @@ Swizzle are entirely optional in hardware at the Embedded Level.*
  
  Implementors must consider Swizzle instructions to be atomically indivisible,
  even if implemented as Micro-coded.  The rest of SVP64 permits element-level
-operations to be Precise-Interrupted: *Swizzle moves do not*.  All XYZW
+operations to be Precise-Interrupted: *Swizzle moves do not* because
+the multiple moves are part of the same insteuction.  All XYZW
  elements *must* be completed in full before any Trap or Interrupt is
  permitted
  to be serviced. Out-of-Order Micro-architectures may of course cancel
author	lkcl <lkcl@web>
	Tue, 14 Jun 2022 00:19:31 +0000 (01:19 +0100)
committer	IkiWiki <ikiwiki.info>
	Tue, 14 Jun 2022 00:19:31 +0000 (01:19 +0100)