From: Luke Kenneth Casson Leighton Date: Tue, 4 Dec 2018 06:16:03 +0000 (+0000) Subject: record conversation snippet X-Git-Tag: convert-csv-opcode-to-binary~4817 X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=a1764222511309b03d453345d90eb21965de951a;p=libreriscv.git record conversation snippet --- diff --git a/3d_gpu/microarchitecture.mdwn b/3d_gpu/microarchitecture.mdwn index e244c3ab7..424986b3d 100644 --- a/3d_gpu/microarchitecture.mdwn +++ b/3d_gpu/microarchitecture.mdwn @@ -129,6 +129,23 @@ like compilers and standard RV code with decent performance. Additionally, quite a few shaders have branching in their internal loops so zero-overhead loops won't be able to fix all the branching problems. +---- + +> you would need a 4-wide cdb anyway, since that's the performance we're +> trying for. + + if the 32-bit ops can be grouped as 2x SIMD to a 64-bit-wide ALU, +then only 2 such ALUs would be needed to give 4x 32-bit FP per cycle +per core, which means only a 2-wide CDB, a heck of a lot better than +4. + + oh: i thought of another way to cut the power-impact of the Reorder +Buffer CAMs: a simple bit-field (a single-bit 2RWW memory, of address +length equal to the number of registers, 2 is because of 2-issue). + + the CAM of a ROB is on the instruction destination register. key: +ROBnum, value: instr-dest-reg. if you have a bitfleid that says "this +destreg has no ROB tag", it's dead-easy to check that bitfield, first. # References