1. Whilst it is easy to justify these high-value instructions they are
sufficiently complex as to warrant placement as optional SFFS in
the new EXT2xx area (marked as Vectoriseable).
-2. Although they are 3-in 2-out the actual encoding is as double-overwrite
+2. Although they are 3-in 2-out the actual encoding is as a double-overwrite
reducing the actual number of operands down to three (RT RA and RB)
- where RT is a Read-Modify-Write and an additional RS is implicit.
-* Although desirable (particularly to detect overflow) Rc=1 is hard to
- conceptualise. It is likely that instead, Simple-V "saturation" if
- enabled will create an Rc=1 CR.SO flag (including SVP64Single).
+ where RT is a Read-Modify-Write and an additional RS (normally RT+1) is implicit.
+3. As with the biginteger set of 3-in 2-out instructions if Power ISA did not
+ already have LD/ST-with-Update, Load/Store-Quad, and other RTp and RAp instructions,
+ these instructions would not be proposed.
+4. The read and write of two overlapping registers normally requires
+ an intermediate register (similar to the justifcation for CAS - Compare-and-Swap).
+ When Vectorised the situation becomes even worse: an entire *Vector*
+ of intermediate temporaries is required.
+ Thus *even if implemented inefficiently* requiring more cycles to complete
+ (taking an extra cycle to write the second result) these instructions still
+ save on resources.
+5. Macro-op fusion equivalents of these instructions is *not possible* for
+ exactly the same reason that the equivalent CAS sequence may not be macro-op
+ fused. Full in-place Vectorised FFT and DCT algorithms *only* become
+ possible due to these instructions atomically reading **both** operands
+ into internal Reservation Stations (exactly like CAS).
+5. Although desirable (particularly to detect overflow) Rc=1 is hard to
+ conceptualise. It is likely that instead, Simple-V "saturation" if
+ enabled will create an Rc=1 CR.SO flag (including SVP64Single).
+6. Saturated variants are **not** included: that is what SVP64 and SVP64Single
+ provides (SVP64 provides a signed/unsigned saturation enhancement)
+7. Unlike in ARM, (except FP Single), 8 16 and 32 bit variants are **not**
+ included: that is what SVP64 and SVP64Single provides (SVP64 "redefines"
+ "FP Single" to be "half of the register/element width").
**Changes**