is as if the
*destination* predicate bit was zero even before starting the operation.
When Rc=1 the CR element however is still stored in the CR regfile, even if the test failed. See appendix for details.
+* **Pack/Unpack** mode, only available when SUBVL is vec2/3/4, performs
+basic structure packing on sub-elements. Bits 4-5 (normally elwidth) are
+taken up as Pack/Unpack bits.
Note that ffirst and reduce modes are not anticipated to be high-performance in some implementations. ffirst due to interactions with VL, and reduce due to it requiring additional operations to produce a result. normal, saturate and pred-result are however inter-element independent and may easily be parallelised to give high performance, regardless of the value of VL.
| 00 | 0 | dz sz | normal mode |
| 00 | 1 | 0 RG | scalar reduce mode (mapreduce), SUBVL=1 |
| 00 | 1 | 1 / | parallel reduce mode (mapreduce), SUBVL=1 |
-| 00 | 1 | SVM RG | subvector reduce mode, SUBVL>1 |
+| 00 | 1 | SVM 0 | subvector reduce mode, SUBVL>1 |
+| 00 | 1 | SVM 1 | Pack/Unpack mode, SUBVL>1 |
| 01 | inv | CR-bit | Rc=1: ffirst CR sel |
| 01 | inv | VLi RC1 | Rc=0: ffirst z/nonz |
| 10 | N | dz sz | sat mode: N=0/1 u/s |
Thus the new VL comprises a contiguous vector of results,
all of which pass the testing criteria (equal to zero, less than zero).
-The CR-based data-driven fail-on-first is new and not found in ARM
-SVE or RVV. It is extremely useful for reducing instruction count,
+The CR-based data-driven fail-on-first is "new" and not found in ARM
+SVE or RVV. At the same time it is "old" because it is almost
+identical to a generalised form of Z80's `CPIR` instruction.
+It is extremely useful for reducing instruction count,
however requires speculative execution involving modifications of VL
to get high performance implementations. An additional mode (RC1=1)
effectively turns what would otherwise be an arithmetic operation