The encoding for this instruction embeds static predication into the
swizzle as well as constants 1/1.0 and 0/0.0
+Although conceptually similar to `vpermd` of Packed SIMD VSX,
+Swizzle Moves come in immediate-only form with only up to four
+selectors, where VSX refers to individual bytes and may not
+copy constants to the destination.
+3D Shader programs commonly use the letters "XYZW"
+when referring to the four swizzle indices, and also often
+use the letters "RGBA"
+if referring to pixel data.
+
# Format
| 0.5 |6.10|11.15|16.27|28.31| name | Form |
| imm |0.2 |3.5 |6.8|9.11|
|-------|----|----|---|----|
|swizzle|X | Y | Z | W |
+|pixel |R | G | B | A |
|index |0 | 1 | 2 | 3 |
the options for each Swizzle are:
quantities, reducing this instruction to 2-in, 2-out pairs of 64-bit
registers:
-| swizzle name | source | dest | half |
+| swizzle name | source | dest | half |
|-- | -- | -- | -- |
| X | RA | RT | lo-half |
| Y | RA | RT | hi-half |
**SVP64 Vectorised**
+Vectorised Swizzle may be considered to be an extended static predicate
+mask for subvectors (SUBVL=2/3/4). SUBVL (and SRC_SUBVL, see later section)
+must be set in order to aid in determining source and destination subvector
+lengths.
+
When Vectorised, given the use-case is for a High-performance GPU,
the fundamental assumption is that Micro-coding or
other technique will
to be serviced. Out-of-Order Micro-architectures may of course cancel
the in-flight instruction as usual if the Interrupt requires fast servicing.
+Swizzle Pseudocode (when SRC_SUBVL=SUBVL):
+
+```
+```
+
+
# RM Mode Concept:
MVRM-2P-2S1D:
for i in range(VL):
yield i+VL*j
- # walk through both source and dest indices simultaneously
- for src_idx, dst_idx in zip(index_src(), index_dst()):
- move_operation(RT+dst_idx, RA+src_idx)
+ # inner looping when SUBVLs are equal
+ if SRC_SUBVL == SUBVL:
+ for idx in index():
+ move_operation(RT+idx, RA+idx)
+ else:
+ # walk through both source and dest indices simultaneously
+ for src_idx, dst_idx in zip(index_src(), index_dst()):
+ move_operation(RT+dst_idx, RA+src_idx)
"yield" from python is used here for simplicity and clarity.
The two Finite State Machines for the generation of the source