From ff9d3c148e80b4a038a356caf5f6e90d0fb516a3 Mon Sep 17 00:00:00 2001 From: Luke Kenneth Casson Leighton Date: Tue, 4 Dec 2018 00:07:05 +0000 Subject: [PATCH] add discussion --- 3d_gpu/microarchitecture.mdwn | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/3d_gpu/microarchitecture.mdwn b/3d_gpu/microarchitecture.mdwn index bd5a0ac87..e244c3ab7 100644 --- a/3d_gpu/microarchitecture.mdwn +++ b/3d_gpu/microarchitecture.mdwn @@ -107,6 +107,29 @@ LDs write. You will find doing VRFs a lot more compact this way. In GPU land we called the flip-flops orchestrating the timing "collectors". +---- + +For GPU workloads FP64 is not common so I think having 1 FP64 alu would +be sufficient. Since indexed loads and stores are not supported, it will +be important to support 4x64 integer operations to generate addresses +for loads/stores. + +I was thinking we would use scoreboarding to keep track of operations +and dependencies since it doesn't need a cam per alu. We should be able +to design it to forward past the register file to allow for 0-latency +forwarding. If we combined that with register renaming it should prevent +most war and waw data hazards. + +I think branch prediction will be essential if only to fetch and decode +operations since it will reduce the branch penalty substantially. + +Note that even if we have a zero-overhead loop extension, branch +prediction will still be useful as we will want to be able to run code +like compilers and standard RV code with decent performance. Additionally, +quite a few shaders have branching in their internal loops so +zero-overhead loops won't be able to fix all the branching problems. + + # References * -- 2.30.2