| 100 | A3A2A1A0 | [A7A7A7A7] | A7A6A5A4 | A3A2A1A0 |
| 101 | A3A2A1A0 | A7A6A5A4 | A3A2A1A0 | A3A2A1A0 |
| 110 | A3A2A1A0 | A3A2A1A0 | A7A6A5A4 | A3A2A1A0 |
-| 111 | A3A2A1A0 | A3A2A1A0 | A3A2A1A0v | A3A2A1A0 |
+| 111 | A3A2A1A0 | A3A2A1A0 | A3A2A1A0 | A3A2A1A0 |
Note how when the entire partition set is open (1x 16-bit output)
that all of A is copied out, and either zero or sign extended
Unlike the parallel case, A is not itself partitioned, so is copied
over as much as is possible. In some cases such as `1x 4-bit, 1x 12-bit`
-(partition mask = `0b100`, above) the 8-bit scalar source will need sign or zero extending.
+(partition mask = `0b100`, above) when copying the 8-bit scalar source
+into the highest part of B (o3) it is truncated to 4 bis (because
+each partition of B is only 4 bits) but for copying to the 12-bit partition
+(o2-o1-00) the 8-bit scalar source, A, will need sign or zero extending.