The encoding for this instruction embeds static predication into the
swizzle as well as constants 1/1.0 and 0/0.0
+An extremely important aspect of 3D GPU workloads is that the source
+and destination subvector lengths may be *different*. A vector of
+contiguous array of vec3 may only 2 elements swizzle-copied to a contiguous
+array of vec2. Swizzle Moves support independent subvector lengths.
+
Although conceptually similar to `vpermd` of Packed SIMD VSX,
Swizzle Moves come in immediate-only form with only up to four
selectors, where VSX refers to individual bytes and may not
|pixel |R | G | B | A |
|index |0 | 1 | 2 | 3 |
-In very simplistic terms the relationship between swizzle indices,
-source, and destination is:
-
- dest[i] = src[swiz[i]]
-
The options for each Swizzle are:
* 0b000 to indicate "skip". this is equivalent to predicate masking
* 0b011 to indicate "constant 1" (or 1.0)
* 0b1NN index 0 thru 3 to copy from subelement in pos XYZW
-Note that 7 options are needed (not 6) because the 7th option allows static
-predicate masking to be encoded within the swizzle immediate.
-For example this allows "W.Y." to specify: "copy W to position X,
+In very simplistic terms the relationship between swizzle indices
+(NN, above), source, and destination is:
+
+ dest[i] = src[swiz[i]]
+
+Note that 7 options are needed (not 6) because option 0b000 allows static
+predicate masking (skipping) to be encoded within the swizzle immediate.
+For example it allows "W.Y." to specify: "copy W to position X,
and Y to position Z, leave the other two positions Y and W unaltered"
0 1 2 3
by half.
When part of the Scalar Power ISA (not SVP64 Vectorised)
mv.swiz and fmv.swiz operate on four 32-bit
-quantities, reducing this instruction to 2-in, 2-out pairs of 64-bit
-registers:
+quantities, reducing this instruction to a feasible
+2-in, 2-out pairs of 64-bit registers:
| swizzle name | source | dest | half |
|-- | -- | -- | -- |
Also, making life easier, RT and RA are only permitted to be even
(no overlapping can occur). This makes RT (and RA) a "pair" exactly
-like `lq` and `stq`. Swizzle instructions must be atomically indivisible:
+as in `lq` and `stq`. Swizzle instructions must be atomically indivisible:
an Exception or Interrupt may not occur during the pair of Moves.
**SVP64 Vectorised**
Implementors must consider Swizzle instructions to be atomically indivisible,
even if implemented as Micro-coded. The rest of SVP64 permits element-level
-operations to be Precise-Interrupted: *Swizzle moves do not*. All XYZW
+operations to be Precise-Interrupted: *Swizzle moves do not* because
+the multiple moves are part of the same insteuction. All XYZW
elements *must* be completed in full before any Trap or Interrupt is
permitted
to be serviced. Out-of-Order Micro-architectures may of course cancel