partition: 1 0 0 1 (4 bits)
actual : <--->|<------------>|<---> actual numbers
carryotmp: o4 o3 o2 o1 o0 (5 bits)
- cascade : | | x x |
- : v v x x v
+ cascade : | | x x | o2 and o1 ignored
carry-out: o4 \-> --> o3 o0 (5 bits)
-
+because the partitions subdivide the 5-wide input into 8-24-8, o4 is already
+in "both" the MSB-and-LSB position for the top 8-bit result; o3 is the
+carry-out for the 24-bit result and must be cascaded down to the *beginning*
+of the 24-bit partitioned result (the LSB), and o0, like o4, is already in
+position because the partition is only 1 wide.