* Condition Register Field operations
* branch
+**Arithmetic**
+
Arithmetic (known as "normal" mode) is where Scalar and Parallel
Reduction can be done: Saturation as well, and two new innovative
modes for Vector ISAs: data-dependent fail-first and predicate result.
it is critical to think in terms of the "rules", that everything is
Scalar instructions in strict Program Order.
+**Branches**
+
Branch is the one and only place where the Scalar
(non-prefixed) operations differ from the Vector (element)
instructions, as explained in a separate section.
which are expected of a Vector / GPU ISA. These save a considerable
number of instructions in tight inner loop situations.
+**CR Field Ops**
+
Condition Register Fields are 4-bit wide and consequently element-width
overrides make absolutely no sense whatsoever. Therefore the elwidth
override field bits can be used for other purposes when Vectorising
reasoning and deduction, help explain why there is an entirely different
CR ops Vectorisation Category.
+**Load/Store**
+
LOAD/STORE is another area that has different needs: this time it is
down to limitations in Scalar LD/ST. Vector ISAs have Load/Store modes
-which simply make no sense in a RISC Scalar ISA:
+which simply make no sense in a RISC Scalar ISA: element-stride and
+unit-stride and the entire concept of a stride itself (a spacing
+between elements) has no place at all in a Scalar ISA. The problems
+come when trying to *retrofit* the concept of "Vector Elements" onto
+a Scalar ISA, and it required a couple of bits (Modes) in the SVP64
+RM Prefix to convey the stride mode, changing the Effective Address
+computation as a result. Interestingly, worth noting for Hardware
+designers: it did turn out to be possible to perform pre-multiplication
+of the D/DS Immediate by the stride amount, making it possible to avoid
+actually modifying the LD/ST Pipelibe itself.
+
+Other areas where LD/ST went quirky: element-width overrides especially
+when combined with Saturation, given that LD/ST operations have byte,
+halfword, word, dword and quad variants. The interaction between these
+widths as part of the actual operation, and the source and destination
+elwidth overrides, was particularly obtuse and hard to derive: some care
+and attention is advised, here, when reading the specification.
# CR weird instructions