add RVV spec link

author Luke Kenneth Casson Leighton <lkcl@lkcl.net>

Wed, 5 Dec 2018 03:57:47 +0000 (03:57 +0000)

committer Luke Kenneth Casson Leighton <lkcl@lkcl.net>

Wed, 5 Dec 2018 03:57:47 +0000 (03:57 +0000)
author Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Wed, 5 Dec 2018 03:57:47 +0000 (03:57 +0000)
committer Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Wed, 5 Dec 2018 03:57:47 +0000 (03:57 +0000)
diff --git a/3d_gpu/microarchitecture.mdwn b/3d_gpu/microarchitecture.mdwn

index 6e4eb9522e5b1e43e74a9d83cd0658b3b9ca0afa..c2f3060bc80ef202a81c4c421f00deea296ee83c 100644 (file)
--- a/3d_gpu/microarchitecture.mdwn
+++ b/3d_gpu/microarchitecture.mdwn
@@ -109,6 +109,23 @@ called the flip-flops orchestrating the timing "collectors".
  
  ----
  
+Justification for Branch Prediction
+
+<http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2018-December/000212.html>
+
+We can combine several branch predictors to make a decent predictor:
+call/return predictor -- important as it can predict calls and returns
+with around 99.8% accuracy loop predictor -- basically counts loop
+iterations some kind of global predictor -- handles everything else
+
+We will also want a btb, a smaller one will work, it reduces average
+branch cycle count from 2-3 to 1 since it predicts which instructions
+are taken branches while the instructions are still being fetched,
+allowing the fetch to go to the target address on the next clock rather
+than having to wait for the fetched instructions to be decoded.
+
+----
+
  For GPU workloads FP64 is not common so I think having 1 FP64 alu would
  be sufficient. Since indexed loads and stores are not supported, it will
  be important to support 4x64 integer operations to generate addresses
author	Luke Kenneth Casson Leighton <lkcl@lkcl.net>
	Wed, 5 Dec 2018 03:57:47 +0000 (03:57 +0000)
committer	Luke Kenneth Casson Leighton <lkcl@lkcl.net>
	Wed, 5 Dec 2018 03:57:47 +0000 (03:57 +0000)