For multiply and divide as shown later it is worthwhile to use
one scalar register effectively as a full 64-bit carry/chain
but in the case of shift, an OR may glue things together, easily,
-and in parallel.
+and in parallel, because unlike `sv.adde`, down-chain
+carry-propagation through multiple elements does not occur.
With Scalar shift and rotate operations in the Power ISA already being
complex and very comprehensive, it is hard to justify creating complex