From: lkcl Date: Mon, 10 Apr 2023 14:29:06 +0000 (+0100) Subject: (no commit message) X-Git-Tag: opf_rfc_ls012_v1~26 X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=0b8b1d4c9a738f8732dd80ec0c61dec963867497;p=libreriscv.git --- diff --git a/openpower/sv/rfc/ls012.mdwn b/openpower/sv/rfc/ls012.mdwn index 0e903519c..c24eaff7e 100644 --- a/openpower/sv/rfc/ls012.mdwn +++ b/openpower/sv/rfc/ls012.mdwn @@ -433,6 +433,14 @@ instructions allow creation of in-place in-register algorithms reducing the numb of registers needed and thus saving power due to making the *overall* algorithm more efficient, as opposed to micro-focussing on a localised power increase. +**How many register files does it use?** + +Complex instructions pulling in data from multiple register files can create unnecessary +issues surrounding Dependency Hazard Management in Out-of-Order systems. As a general +rule it is better to keep complex instructions reading and writing to the same +register file, relying on much simpler (1-in 1-out) instructions to transfer data +between register files. + **Can other existing instructions (plural) do the same job** The general @@ -451,7 +459,46 @@ question **How costly is the encoding?** - +This can either be a single instruction that is costly (several +operands or a few long ones) or it could be a +group of simpler ones that purely due to their number increases overall +encoding cost. An example of an extreme costly instruction would be +those with their own Primary Opcode: addi is a good candidate. However +the sheer overwhelming +number of times that instruction is used easily makes a case for its inclusion. + +Mentioned above was Load-Store-Indexed-Shifted, which only needs 2 bits +to specify how much to shift: x2 x4 x8 or x16. And they are all a 10-bit XO +Field, so not that costly for any one given instruction. +Unfortunately there are *around 30* Load-Store-Indexed Instructions in the Power ISA, +which means an extra *five* bits taken up of precious XO space. +Then let us not forget +the two needed for the Shift amount. Now we are up to *three* bit XO for the group. + +Is this a worthwhile tradeoff? Honestly it could well be. And that's the decision +process that the OpenPOWER ISA Working Group could use some assistance on, to make +the evaluation easier. + +**How many gates does it need?** + +`grevlut` comes in at an astonishing 20,000 gates, where for comparison an FP64 +Multiply typically takes between 12 to 15,000. Not counting the cost in hardware +terms is just asking for trouble. + +**How long will it take to complete?** + +In the case of divide or Transcendentals they are so complex that simple +implementations can often take an astounding 128 clock cycles to complete. +Other instructions waiting for the results back up and eventually stall, +where in-order systems just stall straight away. + +**Summary** + +There are many tradeoffs here, it is a huge list of considerations: any others +known about please do submit feedback so they may be included, here. +Then the evaluation process may take place: again, constructive feedback on +that as to which instructions are a priority also appreciated. The above +helps explain the columns in the tables that follow. # Tables