{{reorder_alias_bytemask_scheme.png}}
+So if you recall from the previous updates about Scoreboards, it's not
+the "scoreboard" that's the key, it's these Register to Function Unit
+and FU to FU Dependency Matrices that are the misunderstood key.
+So let's explain this diagram. Firstly, in purple in the bottom left
+is a massive matrix of FU to FU, just as with the standard CDC 6600,
+except now there are separate 32-bit FUs, 16-bit FUs, and 8-bit FUs.
+In this way, we can have 32-bit ADD depending on and waiting for
+an 8-bit computation, or 16-bit MUL on a 32-bit SQRT and so on. Nothing
+immediately obviously different there.
+Likewise, in the bottom right, in red, we see matrices that have
+FU along rows, and Registers along the columns, exactly again as with
+the CDC 6600 standard scoreboard: however, again, we note that
+because there are separate 32-bit FUs and separate 16-bit and 8-bit
+FUs, there are *three* separate sets of FU-to-Register Matrices.
+Also, note that these are separate, where they would be expected
+to be grouped together. Except, they're *not* independent, and that's
+where the diagram at the top (middle) comes in.
+
+The diagram at the top says, in words, "if you need a 32-bit register
+for an operation (using a 32-bit Function Unit), the 16-bit and 8-bit
+Function Units *also* connected to that exact same register **must**
+be prevented from occuring. Also, if you need 8 bits of a register,
+whilst it does not prevent the other bytes of the register from being
+used, it *does* prevent the overlapping 16-bit portion **and the 32-bit
+and the 64-bit** portions of that same named register from being used".
+
+This is absolutely essential to understand, this "cascading" relationship.
+Need Register R1 (all of it), you **cannot** go and allocate any of that
+register for use in any 32-bit, 16-bit or 8-bit operations. This is
+common sense! However, if you use the lowest byte (byte 1), you can still
+use the top three 16-bit portions of R1, and you can also still use byte 2.
+This is also common sense!
+
+So in fact, it's actually quite simple, and this "cascade" is simply and
+easily propagated down to the Function Unit Dependency Matrices, stopping
+32-bit operations from overwriting 8-bit and vice-versa.
+
+The fourth part is the grid in green, in the top left corner. This is
+a "virtual" to "real" one-bit table. It's here because the size of
+these matrices is so enormous that there is deep concern about the line
+driver strength, as well as the actual size. 128 registers means
+that one single gate, when it goes high or low, has to "drive" the
+input of 128 other gates. That takes longer and longer to do, the higher
+the number of gates, so it becomes a critical factor in determining the
+maximum speed of the entire processor. We will have to keep an eye
+on this.
+
+So, to keep the FU to Register matrix size down, this "virtual" register
+concept was introduced. Only one bit in each row of the green table
+may be active: it says, for example, "IR1 actually represents that there
+is an instruction being executed using R3". This does mean however that
+if this table is not high enough (not enough IRs), the processor has to
+stall until an instruction is completed, so that one register becomes
+free. Again, another thing to keep an eye on, in simulations.
+
+The second major concern is the purple matrix: the FU-to-FU one. Basically
+where previously we would have FU1 cover all ADDs, FU2 would cover all MUL
+operations, FU3 covers BRANCH and so on, now we have to multiply those
+numbers by **four** (64-bit ops, 32-bit ops, 16-bit and 8), which in turn
+means that the size of the FU-to-FU Matrix has gone up by a staggering
+**sixteen** times. This is not really acceptable, so we have to do something
+different.