From: Luke Kenneth Casson Leighton Date: Tue, 20 Nov 2018 18:52:11 +0000 (+0000) Subject: add overlap reg discussion X-Git-Tag: convert-csv-opcode-to-binary~4836 X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=ce8ddb63917ea060a19f9dcd9bbabaf6f848b60c;p=libreriscv.git add overlap reg discussion --- diff --git a/3d_gpu/microarchitecture.mdwn b/3d_gpu/microarchitecture.mdwn index 0188ca18e..4cab848e0 100644 --- a/3d_gpu/microarchitecture.mdwn +++ b/3d_gpu/microarchitecture.mdwn @@ -73,6 +73,38 @@ too much. Yeah pretty much, though I had meant the bank number comes from the least-significant bits of the 7-bit register number. +---- + +Assuming 64-bit operands: +If you could organize 2 SRAM macros and use the pair of them to +read/write 4 registers at a time (256-bits). The pipeline will allow you to +dedicate 3 cycles for reading and 1 cycle for writing (4 registers each). + +RS1 = Read of operand S1 +WRd = Write of result Dst +FMx = Floating Point Multiplier, x = stage. + + |RS1|RS2|RS3|FWD|FM1|FM2|FM3|FM4| + |FWD|FM1|FM2|FM3|FM4| + |FWD|FM1|FM2|FM3|FM4| + |FWD|FM1|FM2|FM3|FM4|WRd| + |RS1|RS2|RS3|FWD|FM1|FM2|FM3|FM4| + |FWD|FM1|FM2|FM3|FM4| + |FWD|FM1|FM2|FM3|FM4| + |FWD|FM1|FM2|FM3|FM4|WRd| + |RS1|RS2|RS3|FWD|FM1|FM2|FM3|FM4| + |FWD|FM1|FM2|FM3|FM4| + |FWD|FM1|FM2|FM3|FM4| + |FWD|FM1|FM2|FM3|FM4|WRd| + +The only trick is getting the read and write dedicated on different clocks. +When the RS3 operand is not needed (60% of the time) you can use +the time slot for reading or writing on behalf of memory refs; STs read, +LDs write. + +You will find doing VRFs a lot more compact this way. In GPU land we +called the flip-flops orchestrating the timing "collectors". + # References *