From: Luke Kenneth Casson Leighton Date: Thu, 13 Apr 2023 09:37:52 +0000 (+0100) Subject: insights into instruction design X-Git-Tag: opf_rfc_ls010_v1~16 X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=2b1101c25f8866e8bb9b5777c8c1c94f465fc5e9;p=libreriscv.git insights into instruction design --- diff --git a/openpower/sv/rfc/ls012.mdwn b/openpower/sv/rfc/ls012.mdwn index e2e19ae9f..3c76bfc1f 100644 --- a/openpower/sv/rfc/ls012.mdwn +++ b/openpower/sv/rfc/ls012.mdwn @@ -573,10 +573,21 @@ will back up and eventually stall, where in-order systems pretty much just stall straight away. Less extreme examples include instructions that take only a few cycles -to complete, but if used in tight loops with Conditional Branches, an +to complete, but if commonly used in tight loops with Conditional Branches, an Out-of-Order system with Speculative capability may need significantly more Reservation Stations to hold in-flight data for instructions which -take longer than those which do not. +take longer than those which do not, so even a single clock cycle reduction +could become important. + +A rule of thumb is that in Hardware, at 4.8 ghz the budget for what is called +"gate propagation delay" is only around 16 to 19 gates chained one after +the other. Anything beyond that budget will need to be stored in DFFs +(Flip-flops) and another set of 16-19 gates continues on the next clock +cycle. Thus for example with `grevlut` above it is almost certainly the +case that high-performance high-clock-rate systems would need at least +two clock cycles (two pipeline stages) to produce a valid result. +This in turn brings us to the next question as it is common to consider +subdividing complex instructions into smaller parts. **Can one instruction do the job of many?** @@ -596,6 +607,14 @@ as a set of four, instead of over 30 separate instructions. Aside from anything this strategy makes the ISA Working Group's evaluation task easier, as well as reducing the work of writing a Compliance Test Suite. +In the case of the MIPS 3D ASE Extension, a Reciprocal-Square-Root +instruction was proposed that was split into two halves: 12-14 bit +accuracy completing in 7 cycles and "Carry On And Get Better Accuracy" +for the second instruction! With 3D only needing reduced accuracy +the saving in power consumption and time was definitely worthwhile, +and it neatly illustrates a counter-example to trying to make one +instruction do too much. + **Summary** There are many tradeoffs here, it is a huge list of considerations: any