From b8e3ff2db3f1dfdb8f818ac7af5c83e26a0c7349 Mon Sep 17 00:00:00 2001 From: lkcl Date: Tue, 14 Jun 2022 01:19:31 +0100 Subject: [PATCH] --- openpower/sv/mv.swizzle.mdwn | 30 ++++++++++++++++++------------ 1 file changed, 18 insertions(+), 12 deletions(-) diff --git a/openpower/sv/mv.swizzle.mdwn b/openpower/sv/mv.swizzle.mdwn index 2d7b39864..53e615ee4 100644 --- a/openpower/sv/mv.swizzle.mdwn +++ b/openpower/sv/mv.swizzle.mdwn @@ -17,6 +17,11 @@ A compromise is to provide a Swizzle "Move". The encoding for this instruction embeds static predication into the swizzle as well as constants 1/1.0 and 0/0.0 +An extremely important aspect of 3D GPU workloads is that the source +and destination subvector lengths may be *different*. A vector of +contiguous array of vec3 may only 2 elements swizzle-copied to a contiguous +array of vec2. Swizzle Moves support independent subvector lengths. + Although conceptually similar to `vpermd` of Packed SIMD VSX, Swizzle Moves come in immediate-only form with only up to four selectors, where VSX refers to individual bytes and may not @@ -44,11 +49,6 @@ to each: |pixel |R | G | B | A | |index |0 | 1 | 2 | 3 | -In very simplistic terms the relationship between swizzle indices, -source, and destination is: - - dest[i] = src[swiz[i]] - The options for each Swizzle are: * 0b000 to indicate "skip". this is equivalent to predicate masking @@ -57,9 +57,14 @@ The options for each Swizzle are: * 0b011 to indicate "constant 1" (or 1.0) * 0b1NN index 0 thru 3 to copy from subelement in pos XYZW -Note that 7 options are needed (not 6) because the 7th option allows static -predicate masking to be encoded within the swizzle immediate. -For example this allows "W.Y." to specify: "copy W to position X, +In very simplistic terms the relationship between swizzle indices +(NN, above), source, and destination is: + + dest[i] = src[swiz[i]] + +Note that 7 options are needed (not 6) because option 0b000 allows static +predicate masking (skipping) to be encoded within the swizzle immediate. +For example it allows "W.Y." to specify: "copy W to position X, and Y to position Z, leave the other two positions Y and W unaltered" 0 1 2 3 @@ -80,8 +85,8 @@ ISA this not practical. A compromise is to cut the registers required by half. When part of the Scalar Power ISA (not SVP64 Vectorised) mv.swiz and fmv.swiz operate on four 32-bit -quantities, reducing this instruction to 2-in, 2-out pairs of 64-bit -registers: +quantities, reducing this instruction to a feasible +2-in, 2-out pairs of 64-bit registers: | swizzle name | source | dest | half | |-- | -- | -- | -- | @@ -100,7 +105,7 @@ copy the contents RA+1 into RT, but set RT+1 to zero. Also, making life easier, RT and RA are only permitted to be even (no overlapping can occur). This makes RT (and RA) a "pair" exactly -like `lq` and `stq`. Swizzle instructions must be atomically indivisible: +as in `lq` and `stq`. Swizzle instructions must be atomically indivisible: an Exception or Interrupt may not occur during the pair of Moves. **SVP64 Vectorised** @@ -146,7 +151,8 @@ Swizzle are entirely optional in hardware at the Embedded Level.* Implementors must consider Swizzle instructions to be atomically indivisible, even if implemented as Micro-coded. The rest of SVP64 permits element-level -operations to be Precise-Interrupted: *Swizzle moves do not*. All XYZW +operations to be Precise-Interrupted: *Swizzle moves do not* because +the multiple moves are part of the same insteuction. All XYZW elements *must* be completed in full before any Trap or Interrupt is permitted to be serviced. Out-of-Order Micro-architectures may of course cancel -- 2.30.2