From a1764222511309b03d453345d90eb21965de951a Mon Sep 17 00:00:00 2001 From: Luke Kenneth Casson Leighton Date: Tue, 4 Dec 2018 06:16:03 +0000 Subject: [PATCH] record conversation snippet --- 3d_gpu/microarchitecture.mdwn | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/3d_gpu/microarchitecture.mdwn b/3d_gpu/microarchitecture.mdwn index e244c3ab7..424986b3d 100644 --- a/3d_gpu/microarchitecture.mdwn +++ b/3d_gpu/microarchitecture.mdwn @@ -129,6 +129,23 @@ like compilers and standard RV code with decent performance. Additionally, quite a few shaders have branching in their internal loops so zero-overhead loops won't be able to fix all the branching problems. +---- + +> you would need a 4-wide cdb anyway, since that's the performance we're +> trying for. + + if the 32-bit ops can be grouped as 2x SIMD to a 64-bit-wide ALU, +then only 2 such ALUs would be needed to give 4x 32-bit FP per cycle +per core, which means only a 2-wide CDB, a heck of a lot better than +4. + + oh: i thought of another way to cut the power-impact of the Reorder +Buffer CAMs: a simple bit-field (a single-bit 2RWW memory, of address +length equal to the number of registers, 2 is because of 2-issue). + + the CAM of a ROB is on the instruction destination register. key: +ROBnum, value: instr-dest-reg. if you have a bitfleid that says "this +destreg has no ROB tag", it's dead-easy to check that bitfield, first. # References -- 2.30.2