primarily controls the looping (quantity, order), RM
influences the *elements* (the Suffix). There is however
some close interaction when it comes to predication.
-REMAP is separately
-outlined in another section.
-
+REMAP is outlined separately.
The primary options all of which are aimed at reducing instruction
count and reducing assembler complexity are:
-* element-width overrides, which dynamically redefine each SFFS or SFS
+* **element-width overrides**, which dynamically redefine each SFFS or SFS
Scalar prefixed instruction to be 8-bit, 16-bit, 32-bit or 64-bit
operands **without requiring new 8/16/32 instructions.**[^pseudorewrite]
This results in full BF16 and FP16 opcodes being added to the Power ISA
**without adding BF16 or FP16 opcodes** including full conversion
between all formats.
-* predication. this is an absolutely essential feature for a 3D GPU VPU ISA.
+* **predication**.
+ this is an absolutely essential feature for a 3D GPU VPU ISA.
CR Fields are available as Predicate Masks hence the reason for their
extension to 128. Twin-Predication is also provided: this may best
be envisaged as back-to-back VGATHER-VSCATTER but is not restricted
of the predicates provides all of the other types of operations
found in Vector ISAs (VEXTRACT, VINSERT etc) again with no need
to actually provide explicit such instructions.
-* Saturation. **all** LD/ST and Arithmetic and Logical operations may
+* **Saturation**. **all** LD/ST and Arithmetic and Logical operations may
be saturated (without adding explicit scalar saturated opcodes)
-* Reduction and Prefix-Sum (Fibonnacci Series) Modes
-* vec2/3/4 "Packing" and "Unpacking" (similar to VSX `vpack` and `vpkss`)
+* **Reduction and Prefix-Sum** (Fibonnacci Series) Modes, including a
+ "Reverse Gear".
+* **vec2/3/4 "Packing" and "Unpacking"** (similar to VSX `vpack` and `vpkss`)
accessible in a way that is easier than REMAP, added for the same reasons
that drove `vpack` and `vpkss` etc. to be added: pixel, audio, and 3D
data manipulation. With Pack/Unpack being part of SVSTATE it can be
applied *in-place* saving register file space (no copy/mv needed).
-* Load/Store speculative "fault-first" behaviour, identical to SVE and RVV
+* **Load/Store "fault-first"** speculative behaviour,
+ identical to SVE and RVV
Fault-first: provides auto-truncation of a speculative sequential parallel
LD/ST batch, helping
solve the "SIMD Considered Harmful" stripmining problem from a Memory
Access perspective.
-* Data-Dependent Fail-First: a 100% Deterministic extension of the LDST
+* **Data-Dependent Fail-First**: a 100% Deterministic extension of the LDST
ffirst concept: first `Rc=1 BO test` failure terminates looping and
truncates VL to that exact point. Useful for implementing algorithms
such as `strcpy` in around 14 high-performance Vector instructions, the
option exists to include or exclude the failing element.
-* Predicate-result: a strategic mode that effectively turns all and any
+* **Predicate-result**: a strategic mode that effectively turns all and any
operations into a type of `cmp`. An `Rc=1 BO test` is performed and if
failing that element result is **not** written to the regfile. The `Rc=1`
Vector of co-results **is** always written (subject to usual predication).