From: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date: Tue, 9 Oct 2018 11:43:19 +0000 (+0100)
Subject: add notes from libre-riscv-dev conversation
X-Git-Tag: convert-csv-opcode-to-binary~4969
X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=d3d9dd0acbc4932af9735d4089d847eb3a7c0c9d;p=libreriscv.git

add notes from libre-riscv-dev conversation
---

diff --git a/3d_gpu/microarchitecture.mdwn b/3d_gpu/microarchitecture.mdwn
new file mode 100644
index 000000000..7b095bd95
--- /dev/null
+++ b/3d_gpu/microarchitecture.mdwn
@@ -0,0 +1,18 @@
+I don't know about power, however I have done some research and a 4Kbyte
+(or 16, icr) SRAM (what I was thinking of for a tile buffer) takes in the
+ballpark of 1000 um^2 in 28nm.
+Using a 4xFMA with a banked register file where the bank is selected by the
+lower order register number means we could probably get away with 1Rx1W
+SRAM as the backing memory for the register file, similarly to Hwacha. I
+would suggest 8 banks allowing us to do more in parallel since we could run
+other units in parallel with a 4xFMA. 8 banks would also allow us to clock
+gate the SRAM banks that are not in use for the current clock cycle
+allowing us to save more power. Note that the 4xFMA could be 4 separately
+allocated FMA units, it doesn't have to be SIMD style. If we have enough hw
+parallelism, we can under-volt and under-clock the GPU cores allowing for a
+more efficient GPU. If we are using the GPU cores as CPU cores as well, I
+think it would be important to be able to use a faster clock speed when not
+using the extended registers (similar to how Intel processors use a lower
+clock rate when AVX512 is in use) so that scalar code is not slowed down
+too much.
+