add bypass

[libreriscv.git] / 3d_gpu / microarchitecture.mdwn
diff --git a/3d_gpu/microarchitecture.mdwn b/3d_gpu/microarchitecture.mdwn

index de5fc334640d9df71d90246726ce16c7db872769..665dde99ab8e995207f599f1ad0850de4e7971c8 100644 (file)
--- a/3d_gpu/microarchitecture.mdwn
+++ b/3d_gpu/microarchitecture.mdwn
@@ -80,6 +80,7 @@ If you could organize 2 SRAM macros and use the pair of them to
  read/write 4 registers at a time (256-bits). The pipeline will allow you to
  dedicate 3 cycles for reading and 1 cycle for writing (4 registers each).
  
+<pre>
  RS1 = Read of operand S1
  WRd = Write of result Dst
  FMx = Floating Point Multiplier, x = stage.
@@ -96,6 +97,7 @@ FMx = Floating Point Multiplier, x = stage.
                                                     |FWD|FM1|FM2|FM3|FM4|
                                                         |FWD|FM1|FM2|FM3|FM4|
                                                             |FWD|FM1|FM2|FM3|FM4|WRd|
+</pre>
  
  The only trick is getting the read and write dedicated on different clocks.
  When the RS3 operand is not needed (60% of the time) you can use
@@ -109,6 +111,11 @@ called the flip-flops orchestrating the timing "collectors".
  
  * <https://en.wikipedia.org/wiki/Tomasulo_algorithm>
  * <https://en.wikipedia.org/wiki/Reservation_station>
+* <https://en.wikipedia.org/wiki/Register_renaming> points out that
+  reservation stations take a *lot* of power.
+* <https://en.wikipedia.org/wiki/Classic_RISC_pipeline#Solution_A._Bypassing>
+  pipeline bypassing
  * Register File Bank Cacheing <https://www.princeton.edu/~rblee/ELE572Papers/MultiBankRegFile_ISCA2000.pdf>
  * Discussion <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2018-November/000157.html>
  * <https://github.com/UCSBarchlab/PyRTL/blob/master/examples/example5-instrospection.py>
+* <https://github.com/ataradov/riscv/blob/master/rtl/riscv_core.v#L210>