* vec2/3/4 "Packing" and "Unpacking" (similar to VSX `vpack` and `vpkss`)
accessible in a way that is easier than REMAP, added for the same reasons
that drove `vpack` and `vpkss` etc. to be added: pixel, audio, and 3D
- data manipulation.
+ data manipulation. With Pack/Unpack being part of SVSTATE it can be
+ applied *in-place* saving register file space (no copy/mv needed).
* Load/Store speculative "fault-first" behaviour, identical to ARM and RVV
Fault-first: provides auto-truncation of a speculative LD/ST helping
solve the "SIMD Considered Harmful" stripmining problem from a Memory
* CR Field ops
* Branch-Conditional - saves on instruction count in 3D parallel if/else
+**Vectorised Branch-Conditional**
+
+As mentioned in the introduction this is the one sole instruction group
+that
+is different pseudocode from its scalar equivalent. However even there
+its various Mode bits and options can be set such that in the degenerate
+case the behaviour becomes identical to Scalar Branch-Conditional.
+
+The two additional Modes within Vectorised Branch-Conditional, both of
+which may be combined, are `CTR-Mode` and `VLI-Test` (aka "Data Fail First").
+CTR Mode extends the way that CTR may be decremented unconditionally
+within Scalar Branch-Conditional, and not only makes it conditional but
+also interacts with predication. VLI-Test provides the same option
+as Data-Dependent Fault-First to Deterministically truncate the Vector
+Length at the fail **or success** point.
+
+Boolean Logic rules on sets (treating the Vector of CR Fields to be tested by
+`BO` as a set) dictate that the Branch should take place on either 'ALL'
+tests succeeding (or failing) or whether 'SOME' tests succeed (or fail).
+These options provide the ability to cover the majority of Parallel
+3D GPU Conditions, saving a not inconsiderable number of instructions
+especially given the close interaction with CTR in hot-loops.
+
**SVP64Single**
The `SVP64-Single` 24-bit encoding focusses primarily on ensuring that