The encoding for this instruction embeds static predication into the
swizzle as well as constants 1/1.0 and 0/0.0
+# Format
+
+| 0.5 |6.10|11.15|16.27|28.31| name |
+|-----|----|-----|-----|-----|--------------|
+|PO | RTp| RAp |imm | 0011| mv.swiz |
+|PO | RTp| RAp |imm | 1011| fmv.swiz |
+
+this gives a 12 bit immediate across bits 16 to 27.
+Each swizzle mnemonic (XYZW), commonly known from 3D GPU programming,
+has an associated index. 3 bits of the immediate are allocated
+to each:
+
+| imm |0.2 |3.5 |6.8|9.11|
+|-------|----|----|---|----|
+|swizzle|X | Y | Z | W |
+|index |0 | 1 | 2 | 3 |
+
+the options for each Swizzle are:
+
+* 0b000 to indicate "skip". this is equivalent to predicate masking
+* 0b001 is not needed (reserved)
+* 0b010 to indicate "constant 0"
+* 0b011 to indicate "constant 1" (or 1.0)
+* 0b1NN index 0 thru 3 to copy from subelement in pos XYZW
+
+Evaluating efforts to encode 12 bit swizzle into less proved unsuccessful: 7^4 comes out to 2,400 which is larger than 11 bits.
+
+Note that 7 options are needed (not 6) because the 7th option allows static
+predicate masking to be encoded within the swizzle immediate.
+For example this allows "W.Y." to specify: "copy W to position X,
+and Y to position Z, leave the other two positions Y and W unaltered"
+
+ 0 1 2 3
+ X Y Z W
+ | |
+ +----+ |
+ | | |
+ +--------------+
+ | | | |
+ W Y Y W
+
**As a Scalar instruction**
Given that XYZW Swizzle can select simultaneously between one *and four*
| Z | RA+1 | RT+1 | lo-half |
| W | RA+1 | RT+1 | hi-half |
-When RA=RT (in-place swizzle) any portion of RT not covered by
+When `RA=RT` (in-place swizzle) any portion of RT not covered by
the Swizzle is unmodified. For example a Swizzle of "..XY"
will copy the contents RA+1 into RT but leave RT+1 unmodified.
-When RA!=RT any part of RT or RT+1 not set as a destination by
+When `RA!=RT` any part of RT or RT+1 not set as a destination by
the Swizzle will be set to zero. A Swizzle of "..XY" would
copy the contents RA+1 into RT, but set RT+1 to zero.
*Implementor's note: the cost of Vertical-First Mode in an Embedded design
of storing four 64-bit in-flight elements may be too high. If this is the
-case it is acceptable to throw an Illegal Instruction Trap.
+case it is acceptable to throw an Illegal Instruction Trap, and emulate
+the instruction in software. Performance will obviously be adversely affected.
See [[sv/compliancy_levels]]*
-# Format
-
-| 0.5 |6.10|11.15|16.27|28.31| name |
-|-----|----|-----|-----|-----|--------------|
-|PO | RTp| RAp |imm | 0011| mv.swiz |
-|PO | RTp| RAp |imm | 1011| fmv.swiz |
-
-this gives a 12 bit immediate across bits 16 to 27.
-Each swizzle mnemonic (XYZW), commonly known from 3D GPU programming,
-has an associated index. 3 bits of the immediate are allocated
-to each:
-
-| imm |0.2 |3.5 |6.8|9.11|
-|-------|----|----|---|----|
-|swizzle|X | Y | Z | W |
-|index |0 | 1 | 2 | 3 |
-
-the options for each Swizzle are:
-
-* 0b000 to indicate "skip". this is equivalent to predicate masking
-* 0b001 is not needed (reserved)
-* 0b010 to indicate "constant 0"
-* 0b011 to indicate "constant 1" (or 1.0)
-* 0b1NN index 0 thru 3 to copy from subelement in pos XYZW
-
-Evaluating efforts to encode 12 bit swizzle into less proved unsuccessful: 7^4 comes out to 2,400 which is larger than 11 bits.
-
-Note that 7 options are needed (not 6) because the 7th option allows static
-predicate masking to be encoded within the swizzle immediate.
-For example this allows "W.Y." to specify: "copy W to position X,
-and Y to position Z, leave the other two positions Y and W unaltered"
-
- 0 1 2 3
- X Y Z W
- | |
- +----+ |
- | | |
- +--------------+
- | | | |
- W Y Y W
-
# RM Mode Concept:
MVRM-2P-2S1D: