[[!tag standards]] # mv.swizzle Links * * Swizzle is usually done on a per-operand basis in 3D GPU ISAs, making for extremely long instructions (64 bits or greater). Their value lies in the high occurrence of Swizzle in 3D Shader Binaries (over 10% of all instructions). A compromise is to provide a Swizzle "Move". The encoding for this instruction embeds static predication into the swizzle as well as constants 1/1.0 and 0/0.0 **As a Scalar instruction** Given that XYZW Swizzle can select simultaneously between one *and four* register operands, a full version of this instruction would be an eye-popping 8 64-bit operands: 4-in, 4-out. As part of a Scalar ISA this not practical. A compromise is to cut the registers required by half. When part of the Scalar Power ISA (not SVP64 Vectorised) mv.swiz and fmv.swiz operate on four 32-bit quantities, reducing this instruction to 2-in, 2-out pairs of 64-bit registers: | swizzle name | source | dest | half | |-- | -- | -- | -- | | X | RA | RT | lo-half | | Y | RA | RT | hi-half | | Z | RA+1 | RT+1 | lo-half | | W | RA+1 | RT+1 | hi-half | When RA=RT (in-place swizzle) any portion of RT not covered by the Swizzle is unmodified. For example a Swizzle of "..XY" will copy the contents RA+1 into RT but leave RT+1 unmodified. When RA!=RT any part of RT or RT+1 not set as a destination by the Swizzle will be set to zero. A Swizzle of "..XY" would copy the contents RA+1 into RT, but set RT+1 to zero. Also, making life easier, RT and RA are only permitted to be even (no overlapping can occur). This makes RT (and RA) a "pair" exactly like `lq` and `stq` **SVP64 Vectorised** When Vectorised, the # Format | 0.5 |6.10|11.15|16.27|28.31| name | |-----|----|-----|-----|-----|--------------| |PO | RTp| RAp |imm | 0011| mv.swiz | |PO | RTp| RAp |imm | 1011| fmv.swiz | this gives a 12 bit immediate across bits 16 to 25 and 29-30. * 3 bits X * 3 bits Y * 3 bits Z * 3 bits W the options are: * 0b000 to indicate "skip". this is equivalent to predicate masking * 0b001 is not needed (reserved) * 0b010 to indicate "constant 0" * 0b011 to indicate "constant 1" (or 1.0) * 0b1NN index 0 thru 3 to copy from subelement in pos XYZW Evaluating efforts to encode 12 bit swizzle into less proved unsuccessful: 7^4 comes out to 2,400 which is larger than 11 bits. Note that 7 options are needed (not 6) because the 7th option allows predicate masking to be encoded within the swizzle immediate. For example this allows "W..Y" to be specified, "copy W to position X, and Y to position W, leave the other two positions Y and Z unaltered" # RM Mode Concept: MVRM-2P-2S1D: | Field Name | Field bits | Description | |------------|------------|----------------------------| | Rdest_EXTRA2 | `10:11` | extends Rdest (R\*\_EXTRA2 Encoding) | | Rsrc_EXTRA2 | `12:13` | extends Rsrc (R\*\_EXTRA2 Encoding) | | src_SUBVL | `14:15` | SUBVL for Source | | MASK_SRC | `16:18` | Execution Mask for Source | The inclusion of a separate src SUBVL would allow either `sv.mv.swiz RT.vecN RA.vecN` to mean contiguous sequential copy or it could mean zip/unzip (pack/unpack).