Covered in [[biginteger/analysis]] the summary is that standard `adde`
is sufficient for SVP64 Vectorisation of big-integer addition (and `subfe`
for subtraction) but that big-integer shift, multiply and divide require an
-extra 3-in 2-out instructions, similar to Intel's `shld`, `shrd`,
+extra 3-in 2-out instructions, similar to Intel's
+[shld](https://www.felixcloutier.com/x86/shld)
+and [shrd](https://www.felixcloutier.com/x86/shrd),
`mulx` and `idiv`, to be efficient.
-The same instruction (`maddedu`) is used for both because 'maddedu''s primary
+The same instruction (`maddedu`) is used in both
+big-divide and big-multiply because 'maddedu''s primary
purpose is to perform a fused 64-bit scalar multiply with a large vector,
where that result is Big-Added for Big-Multiply, but Big-Subtracted for
Big-Divide.
+Chaining the operations together gives Scalar-by-Vector
+operations, except for `sv.adde` and `sv.subfe` which are
+Vector-by-Vector Chainable (through the `CA` flag).
Macro-op Fusion and back-end massively-wide SIMD ALUs may be deployed in a
fashion that is hidden from the user, behind a consistent, stable ISA API.
The same macro-op fusion may theoretically be deployed even on Scalar
**DRAFT**
-`dsld` and `dsrd` are is similar to v3.0 `sld`, and
+`dsld` and `dsrd` are similar to v3.0 `sld`, and
is Z23-Form in "overwrite" on RT.
|0.....5|6..10|11..15|16..20|21.22|23..30|31|