Even in OpenPOWER v3.0B, the Scalar Integer ISA is around 150 instructions, with IEEE754 FP adding approximately 80 more. VSX, being based on SIMD design principles, adds somewhere in the region of 600 more. SimpleV again provides over 95% of VSX functionality, simply by augmenting the *Scalar* OpenPOWER ISA, and in the process providing features such as predication, which VSX is entirely missing.
+In fairness to both VSX and RVV, there are things that are not provided by SimpleV:
+
+* 128 bit or above arithmetic and other operations
+ (VSX Rijndael and SHA primitives; VSX shuffle and bitpermute operations)
+* register files above 128
+* Vector lengths over 64
+* Unit-strided LD/ST and other comprehensive memory operations
+ (struct-based LD/ST from RVV for example)
+* 32-bit instruction lengths. [[svp64]] had to be added as 64 bit.
+
+These are not insurmountable limitations, that, over time, may well be added in future revisions of SV.
+
The rest of this document builds on the above simple loop to add:
* Vector-Scalar, Scalar-Vector and Scalar-Scalar operation
* Traditional Vector operations (VSPLAT, VINSERT, VCOMPRESS etc)
* Predication masks (essential for parallel if/else constructs)
* 8, 16 and 32 bit integer operations, and both FP16 and BF16.
+* Compacted operations into registers (normally only provided by SIMD)
* Fail-on-first (introduced in ARM SVE2)
* A new concept: Data-dependent fail-first
* Condition-Register based *post-result* predication (also new)
All of this is *without modifying the OpenPOWER v3.0B ISA*, except to add "wrapping context", similar to how v3.1B 64 Prefixes work.
-In fairness to both VSX and RVV, there are things that are not provided by SimpleV:
-
-* 128 bit or above arithmetic and other operations
- (VSX Rijndael and SHA primitives; VSX shuffle and bitpermute operations)
-* register files above 128
-* Vector lengths over 64
-* Unit-strided LD/ST and other comprehensive memory operations
- (struct-based LD/ST from RVV for example)
-* 32-bit instruction lengths. [[svp64]] had to be added as 64 bit.
-
-These are not insurmountable limitations, that, over time, may well be added in future revisions of SV.
-
# Adding Scalar / Vector
The first augmentation to the simple loop is to add the option for all source and destinations to all be either scalar or vector. As a FSM this is where our "simple" loop gets its first complexity.