From 454dacbb79174dbd3db91cee5edf4839a1ca869b Mon Sep 17 00:00:00 2001
From: lkcl <lkcl@web>
Date: Mon, 10 Apr 2023 15:04:47 +0100
Subject: [PATCH]

---
 openpower/sv/rfc/ls012.mdwn | 46 +++++++++++++++++++++++++++++++++++++
 1 file changed, 46 insertions(+)

diff --git a/openpower/sv/rfc/ls012.mdwn b/openpower/sv/rfc/ls012.mdwn
index aaf55281c..0e903519c 100644
--- a/openpower/sv/rfc/ls012.mdwn
+++ b/openpower/sv/rfc/ls012.mdwn
@@ -405,6 +405,52 @@ EXT022 Sandbox.
 
 **How many registers does it need?**
 
+The basic RISC Paradigm is not only to make instruction encoding simple (often
+"wasting" encoding space compared to highly-compacted ISAs such as x86), but
+also to keep the number of registers used down to a minimum.
+
+Counter-examples are FMAC which had to be added to IEEE754 because the
+*internal* product requires more accuracy than can fit into a register.
+Another would be a dotproduct instruction, which again requires an accumulator
+of at least double the width of the two vector inputs.  And in the AMDGPU
+ISA, there are Texture-mapping instructions taking up to an astounding
+*twelve* input operands!
+
+The downside of going too far however has to be a trade-off with the next
+question. Both MIPS and RISC-V lack Condition Codes, which means that emulating
+x86 Branch-Conditional requires *ten* MIPS instructions.
+
+The downside of creating too complex instructions is that the Dependency Hazard
+Management in high-performance multi-issue out-of-order microarchitectures
+becomes infeasibly large, and even simple in-order systems may have performance
+severely compromised by an overabundance of stalls.  Also worth remembering
+is that register file ports are insanely costly, not just to design but also
+use considerable power.
+
+That said there do exist genuine reasons why more registers is better than less:
+Compare-and-Swap has huge benefits but is costly to implement, and DCT/FFT Twin-Butterfly
+instructions allow creation of in-place in-register algorithms reducing the number
+of registers needed and thus saving power due to making the *overall* algorithm
+more efficient, as opposed to micro-focussing on a localised power increase.
+
+**Can other existing instructions (plural) do the same job**
+
+The general
+rule being: if two or more instructions can do the same job, leave it out...
+*unless* the number of occurrences of that instruction being missing is causing
+huge increases in binary size.  RISC-V has gone too far in this regard,
+as explained here: <https://news.ycombinator.com/item?id=24459314>
+
+Good examples are LD-ST-Indexed-shifted (multiply RB by 2, 4 8 or 16)
+which are high-priority instructions in x86 and ARM, but lacking in
+Power ISA, MIPS, and RISC-V. With many critical hot-loops in Computer
+Science having to perform shift and add as explicit instructions, adding
+LD/ST-shifted should be considered high priority, except that the sheer
+*number* of such instructions needing to be added takes us into the next
+question
+
+**How costly is the encoding?**
+
 
 
 # Tables
-- 
2.30.2