From: Luke Kenneth Casson Leighton Date: Mon, 17 Dec 2018 15:56:41 +0000 (+0000) Subject: add update 005 X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=0170fa3e4a03e2e92d3651e563a3ce4b9ba5671b;p=crowdsupply.git add update 005 --- diff --git a/updates/005_2018dec14_simd_without_simd.mdwn b/updates/005_2018dec14_simd_without_simd.mdwn index ce952f7..69f19af 100644 --- a/updates/005_2018dec14_simd_without_simd.mdwn +++ b/updates/005_2018dec14_simd_without_simd.mdwn @@ -61,4 +61,66 @@ the Register and Function Unit Matrices? It looks like this: {{reorder_alias_bytemask_scheme.png}} +So if you recall from the previous updates about Scoreboards, it's not +the "scoreboard" that's the key, it's these Register to Function Unit +and FU to FU Dependency Matrices that are the misunderstood key. +So let's explain this diagram. Firstly, in purple in the bottom left +is a massive matrix of FU to FU, just as with the standard CDC 6600, +except now there are separate 32-bit FUs, 16-bit FUs, and 8-bit FUs. +In this way, we can have 32-bit ADD depending on and waiting for +an 8-bit computation, or 16-bit MUL on a 32-bit SQRT and so on. Nothing +immediately obviously different there. +Likewise, in the bottom right, in red, we see matrices that have +FU along rows, and Registers along the columns, exactly again as with +the CDC 6600 standard scoreboard: however, again, we note that +because there are separate 32-bit FUs and separate 16-bit and 8-bit +FUs, there are *three* separate sets of FU-to-Register Matrices. +Also, note that these are separate, where they would be expected +to be grouped together. Except, they're *not* independent, and that's +where the diagram at the top (middle) comes in. + +The diagram at the top says, in words, "if you need a 32-bit register +for an operation (using a 32-bit Function Unit), the 16-bit and 8-bit +Function Units *also* connected to that exact same register **must** +be prevented from occuring. Also, if you need 8 bits of a register, +whilst it does not prevent the other bytes of the register from being +used, it *does* prevent the overlapping 16-bit portion **and the 32-bit +and the 64-bit** portions of that same named register from being used". + +This is absolutely essential to understand, this "cascading" relationship. +Need Register R1 (all of it), you **cannot** go and allocate any of that +register for use in any 32-bit, 16-bit or 8-bit operations. This is +common sense! However, if you use the lowest byte (byte 1), you can still +use the top three 16-bit portions of R1, and you can also still use byte 2. +This is also common sense! + +So in fact, it's actually quite simple, and this "cascade" is simply and +easily propagated down to the Function Unit Dependency Matrices, stopping +32-bit operations from overwriting 8-bit and vice-versa. + +The fourth part is the grid in green, in the top left corner. This is +a "virtual" to "real" one-bit table. It's here because the size of +these matrices is so enormous that there is deep concern about the line +driver strength, as well as the actual size. 128 registers means +that one single gate, when it goes high or low, has to "drive" the +input of 128 other gates. That takes longer and longer to do, the higher +the number of gates, so it becomes a critical factor in determining the +maximum speed of the entire processor. We will have to keep an eye +on this. + +So, to keep the FU to Register matrix size down, this "virtual" register +concept was introduced. Only one bit in each row of the green table +may be active: it says, for example, "IR1 actually represents that there +is an instruction being executed using R3". This does mean however that +if this table is not high enough (not enough IRs), the processor has to +stall until an instruction is completed, so that one register becomes +free. Again, another thing to keep an eye on, in simulations. + +The second major concern is the purple matrix: the FU-to-FU one. Basically +where previously we would have FU1 cover all ADDs, FU2 would cover all MUL +operations, FU3 covers BRANCH and so on, now we have to multiply those +numbers by **four** (64-bit ops, 32-bit ops, 16-bit and 8), which in turn +means that the size of the FU-to-FU Matrix has gone up by a staggering +**sixteen** times. This is not really acceptable, so we have to do something +different.