From 0f6dda39464ead5b0dddd2968f9224aad9da6bfb Mon Sep 17 00:00:00 2001 From: Luke Kenneth Casson Leighton Date: Wed, 5 Dec 2018 04:51:36 +0000 Subject: [PATCH] add conversation notes --- 3d_gpu/microarchitecture.mdwn | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/3d_gpu/microarchitecture.mdwn b/3d_gpu/microarchitecture.mdwn index c2f3060bc..b496631ff 100644 --- a/3d_gpu/microarchitecture.mdwn +++ b/3d_gpu/microarchitecture.mdwn @@ -126,6 +126,20 @@ than having to wait for the fetched instructions to be decoded. ---- +> https://www.researchgate.net/publication/316727584_A_case_for_standard-cell_based_RAMs_in_highly-ported_superscalar_processor_structures + +well, there is this concept: +https://www.princeton.edu/~rblee/ELE572Papers/MultiBankRegFile_ISCA2000.pdf + +it is a 2-level hierarchy for register cacheing. honestly, though, the +reservation stations of the tomasulo algorithm are similar to a cache, +although only of the intermediate results, not of the initial operands. + +i have a feeling we should investigate putting a 2-level register cache +in front of a multiplexed SRAM. + +---- + For GPU workloads FP64 is not common so I think having 1 FP64 alu would be sufficient. Since indexed loads and stores are not supported, it will be important to support 4x64 integer operations to generate addresses @@ -240,3 +254,4 @@ Reorder Buffer Entry * Discussion * * +* -- 2.30.2