is sufficient for SVP64 Vectorisation of big-integer addition (and `subfe`
for subtraction) but that big-integer shift, multiply and divide require an
extra 3-in 2-out instructions, similar to Intel's
-[shld](https://www.felixcloutier.com/x86/shld)
-and [shrd](https://www.felixcloutier.com/x86/shrd),
+[shlq](https://www.felixcloutier.com/x86/shld)
+and [shrq](https://www.felixcloutier.com/x86/shrd),
`mulx` and `divq`, to be efficient.
The same instruction (`maddedu`) is used in both
big-divide and big-multiply because 'maddedu''s primary