From ba624b78f98070ee56e274e0909eaf91438c2b55 Mon Sep 17 00:00:00 2001 From: lkcl Date: Wed, 13 Jan 2021 10:20:47 +0000 Subject: [PATCH] --- openpower/sv/svp64/appendix.mdwn | 47 ++++++++++++++++---------------- 1 file changed, 23 insertions(+), 24 deletions(-) diff --git a/openpower/sv/svp64/appendix.mdwn b/openpower/sv/svp64/appendix.mdwn index c1f83f834..54b65f353 100644 --- a/openpower/sv/svp64/appendix.mdwn +++ b/openpower/sv/svp64/appendix.mdwn @@ -314,7 +314,9 @@ Thus the compiler when referring to CR0 still generates code that it thinks is s In concrete terms: when the Vector looping proceeds to increment Integer or FP register numbers linearly, `fp1 fp2 fp3...` when `Rc=1` the Vector of CRs should start at `CR1` just as they do in scalar execution but *not overwrite CR2*. Instead proceed to write to at least 8 or 16 CRs before doing so. -Two ways in which this may occur: either for numbering to be linear (`CR0..CR127`) but to jump in increments of 8, or to be expressed as sub-numbers similar to FP: `CR1.0 CR1.1 ... CR1.15 CR2.0`. Fractional numbering is more natural and intuitive. Here is a table showing progression from 0 to VL-1 when VL=18, should an Integer Vector operation writes first to `CR0`. It is the 16th element before `CR1` is overwritten: +Two ways in which this may occur: either for numbering to be linear (`CR0..CR127`) but to jump in increments of 8, or to be expressed as sub-numbers similar to FP fractions: `CR1.0 CR1.1 ... CR1.15 CR2.0`. Fractional numbering is more natural and intuitive. The "original" (scalar) CRs 0-7 therefore are interleaved every 16th point in the progression. They are also effectively given a second name: `CR0` is now also named `CR0.0` in effect. + +Here is a table showing progression from 0 to VL-1 when VL=18, should an Integer Vector operation writes first to `CR0`. It is the 16th element before `CR1` is overwritten: CRn.0 CR0 0 CR1 16 CR2 CR3 CR4 CR5 CR6 CR7 CRn.1 1 17 @@ -322,7 +324,7 @@ Two ways in which this may occur: either for numbering to be linear (`CR0..CR127 ... .. CRn.15 15 -This gives an opportunity to minimise modifications to gcc and llvm for any Vectorisation up to a reasonable length of `MVL=16`. +This gives an opportunity to minimise modifications to gcc and llvm for any Vectorisation up to a reasonable length of `MVL=16`. The register file is viewed as comprising 16 32-bit Condition Registers. ## CR EXTRA mapping table and algorithm @@ -350,35 +352,32 @@ applies, **not** the CR\_bit portion (bits 0:1). spec = EXTRA3 else: spec = EXTRA2<<1 | 0b0 - if spec[2]: - # vector constructs "BA[2:4] spec[0:1] 0 BA[0:1]" - return ((BA >> 2)<<5) | # hi 3 bits shifted up - (spec[0:1]<<3) | # to make room for these + # constructs "BA[2:4] spec[0:1] 00 BA[0:1]" + return ((BA >> 2)<<6) | # hi 3 bits shifted up + (spec[0:1]<<4) | # to make room for these (BA & 0b11) # CR_bit on the end - else: - # scalar constructs "0 spec[0:1] BA[0:4]" - return (spec[0:1] << 5) | BA Thus, for example, to access a given bit for a CR in SV mode, the v3.0B -algorithm to determin CR\_reg is modified to as follows: - - CR_index = 7-(BA>>2) # top 3 bits but BE - if spec[2]: - # vector mode - CR_index = (CR_index<<3) | (spec[0:1] << 1) - else: - # scalar mode - CR_index = (spec[0:1]<<3) | CR_index - # same as for v3.0/v3.1 from this point onwards - bit_index = 3-(BA & 0b11) # low 2 bits but BE - CR_reg = CR{CR_index} # get the CR - # finally get the bit from the CR. - CR_bit = (CR_reg & (1<>2) # top 3 bits but BE + CR_index = (CR_index<<4) | (spec[0:1] << 2) + # first get one of the 16 32-bit CRs + CR_row = (CR_index>>4) + (idx&0xf) + CR = CRfile[CR_row] + # now get the 4 bit CRn in that 32-bit CR + CR_col = (CR_index + (idx>>4)) & 0x7 + CR_reg = CR{CR_col} # get 4 bit CRn + # same as for v3.0/v3.1 from this point onwards + bit_index = 3-(BA & 0b11) # low 2 bits but BE + # finally get the bit from the CR. + CR_bit = (CR_reg & (1<