The reason why three instructions are needed instead of one in the
case of big-add is because multiple bits chain through to the
-next element, where for add it is a single bit (carry-in, carry-out).
+next element, where for add it is a single bit (carry-in, carry-out),
+and this is precisely what `adde` already does.
For multiply and divide as shown later it is worthwhile to use
one scalar register effectively as a full 64-bit carry/chain
but in the case of shift, an OR may glue things together, easily,