when combined with vec2/3/4 the reordering can even go as far as
four dimensions (four nested fixed size loops).
-Overall the LD/ST Modes available are extremely powerful, especially
+Twin Predication is worth a special mention. Many Vector ISAs have
+special LD/ST `VCOMPRESS` and `VREDUCE` instructions, which sequentially
+skip elements based on predicate mask bits. They also add special
+`VINSERT` and `VEXTRACT` Register-based instructions to compensate
+for lack of single-element LD/ST (where in Simple-V you just use
+Scalar LD/ST). Also Broadcasting (`VSPLAT`) is either added to LDST
+or as Register-based.
+
+*All of the above modes are covered by Twin-Predication*
+
+In particular, a special predicate mode `1<<r3` uses the register `r3`
+*binary* value effectively as a single (Scalar) Index offset into
+what would otherwise be a Vector operation. Combined with the
+(mis-named) "mapreduce" mode when used as a source predicate
+a `VSPLAT` is performed. When used as a destination predicate `1<<r3`
+provides `VINSERT` behaviour.
+
+Also worth an explicit mention is that Twin Predication when using
+different source from destination predicate masks effectively combines
+back-to-back `VCOMPRESS` and `VEXPAND` (in a single instruction), and,
+further, that the benefits of Twin Predication are not limited to LD/ST,
+they may be applied to Arithmetic, Logical and CR Field operations as well.
+
+Overall the LD/ST Modes available are astoundingly powerful, especially
when combining arithmetic (lharx) with saturation, element-width overrides,
+Twin Predication,
vec2/3/4 Structure Packing *and* REMAP, the combinations far exceed anything
seen in any other Vector ISA in history, yet are really nothing more
than concepts abstracted out in pure RISC form.[^ldstcisc]