Reservation Stations / Function Units, and block more often, we actually
don't mind so much. Also, we can still apply the same "banks" trick on
the Register File, except this time with 4-way multiplexing on 32-bit
-wide banks, and 4x4 crossbars on the bytes:
+wide banks, and 4x4 crossbars on the bytes as well:
+To cope with 16-bit operations, pairs of 8-bit values in adjacent Function
+Units are reserved. Likewise for 64-bit operations, the 8-bit crossbars
+are not used, and pairs of 32-bit source values in adjacent Function Units
+in the *32-bit* FU area are reserved.
+However, the gate count in such a staggered crossbar arrangement is insane:
+bear in mind that this will be 3R1W or 2R1W (2 or 3 reads, 1 write per
+register), and that means **three** sets of crossbars, comprising **four**
+banks, with effectively 16 byte to 16 byte routing.
+It's too much - so in later updates, this will be explored further.