From: Luke Kenneth Casson Leighton Date: Sun, 20 Dec 2020 15:16:16 +0000 (+0000) Subject: update CR table/pseudocode X-Git-Tag: convert-csv-opcode-to-binary~1137 X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=171ddaa62bc4cacaf20ff066b82d16214239fe81;p=libreriscv.git update CR table/pseudocode --- diff --git a/openpower/sv/svp_rewrite/svp64.mdwn b/openpower/sv/svp_rewrite/svp64.mdwn index a9a70530f..0e7f10f74 100644 --- a/openpower/sv/svp_rewrite/svp64.mdwn +++ b/openpower/sv/svp_rewrite/svp64.mdwn @@ -494,49 +494,98 @@ RB etc. are interpreted as v3.0B / v3.1B scalar registers. This is termed # CR Operations -## EXTRA mapping algorithm +## CR EXTRA mapping table and algorithm -Numbering relationships for CR fields are already complex due to bring in BE format. In OpenPOWER v3.0/1, BFA is 5 bits in order to select one of 4 bits from one of the 8 CRs. The numbering was determined - after 4 months - to be as follows: +Numbering relationships for CR fields are already complex due to bring +in BE format. However with some care and consideration the exact same +mapping used for INT and FP regfiles may be applied, just to the upper bits, +as explained below. - CR_index = 7-BFA>>2 # top 3 bits but BE - bit_index = 3-(BFA & 0b11) # low 2 bits but BE +In OpenPOWER v3.0/1, BF/BT/BA/BB are all 5 bits. The top 3 bits (2:4) +select one of the 8 CRs; the bottom 2 bits (0:1) select one of 4 bits +in that CR. The numbering was determined (after 4 months of +analysis and research) to be as follows: + + CR_index = 7-(BA>>2) # top 3 bits but BE + bit_index = 3-(BA & 0b11) # low 2 bits but BE CR_reg = CR[CR_index] # get the CR - # finally get the bit from the CR + # finally get the bit from the CR. CR_bit = (CR_reg & (1<> 2)<<4) | # hi 3 bits shifted up + if extra3_mode: + spec = EXTRA3 + else: + spec = EXTRA2<<1 | 0b0 + if spec[2]: + # vector constructs "BA[2:4] spec[0:1] BA[0:1]" + return ((BA >> 2)<<4) | # hi 3 bits shifted up (spec[0:1]<<2) | # to make room for these - (BFA & 0b11) # CR_bit on the end - else: # scalar - return BFA + spec[0:1] << 7 + (BA & 0b11) # CR_bit on the end + else: + # scalar constructs "spec[0:1] BA[0:4]" + return BA + spec[0:1] << 5 + +Thus, for example, to access a given bit for a CR in SV mode: + + CR_index = 7-(BA>>2) # top 3 bits but BE + if spec[2]: + # vector mode + CR_index = (CR_index<<2) | (spec[0:1]) + else: + # scalar mode + CR_index = CR_index | (spec[0:1]<<3) + # same as for v3.0/v3.1 from this point onwards + bit_index = 3-(BA & 0b11) # low 2 bits but BE + CR_reg = CR[CR_index] # get the CR + # finally get the bit from the CR. + CR_bit = (CR_reg & (1< 0 ... etc -If a "cumulated" CR based analysis of results is desired (a la VSX CR6) then a followup instruction must be performed, setting "reduce" mode on the Vector of CRs, using cr ops (crand, crnor)to do so. This provides far more flexibility in analysing vectors than standard Vector ISAs. Normal Vector ISAs are typically restricted to "were all results nonzero" and "were some results nonzero". The application of mapreduce to Vectorised cr operations allows far more sophisticated analysis, particularly in conjunction with the new crweird operations see [[sv/cr_int_predication]]. - -Note in particular that the use of a separate instruction in this way ensures that high performance multi-issue OoO inplementations do not have the computation of the cumulative analysis CR as a bottleneck and hindrance, regardless of the length of VL. +If a "cumulated" CR based analysis of results is desired (a la VSX CR6) +then a followup instruction must be performed, setting "reduce" mode on +the Vector of CRs, using cr ops (crand, crnor)to do so. This provides far +more flexibility in analysing vectors than standard Vector ISAs. Normal +Vector ISAs are typically restricted to "were all results nonzero" and +"were some results nonzero". The application of mapreduce to Vectorised +cr operations allows far more sophisticated analysis, particularly in +conjunction with the new crweird operations see [[sv/cr_int_predication]]. + +Note in particular that the use of a separate instruction in this way +ensures that high performance multi-issue OoO inplementations do not +have the computation of the cumulative analysis CR as a bottleneck and +hindrance, regardless of the length of VL. (see [[discussion]]. some alternative schemes are described there)