CR_bit = (CR_reg & (1<<bit_index)) != 0
When it comes to applying SV, it is the CR\_reg number to which SV EXTRA2/3
-applies, **not** the CR\_bit portion (bits 0:1).
+applies, **not** the CR\_bit portion (bits 0:1):
if extra3_mode:
spec = EXTRA3
else:
spec = EXTRA2<<1 | 0b0
- # constructs "BA[2:4] spec[0:1] 00 BA[0:1]"
- return ((BA >> 2)<<6) | # hi 3 bits shifted up
- (spec[0:1]<<4) | # to make room for these
+ if spec[2]:
+ # vector constructs "BA[2:4] spec[0:1] 0 BA[0:1]"
+ return ((BA >> 2)<<5) | # hi 3 bits shifted up
+ (spec[0:1]<<3) | # to make room for these
(BA & 0b11) # CR_bit on the end
+ else:
+ # scalar constructs "0 spec[0:1] BA[0:4]"
+ return (spec[0:1] << 5) | BA
Thus, for example, to access a given bit for a CR in SV mode, the v3.0B
-algorithm to determine CR\_reg is modified to as follows, noting that there are now 16 32 bit CRs, and that the element progression is *not linear*:
-
- def get_cr_bit(BA, idx): # for idx 0 to VL-1
- CR_index = 7-(BA>>2) # top 3 bits but BE
- CR_index = (CR_index<<4) | (spec[0:1] << 2)
- # first get one of the 16 32-bit CRs
- CR_row = (CR_index>>4) + (idx&0xf)
- CR = CRfile[CR_row]
- # now get the 4 bit CRn in that 32-bit CR
- CR_col = (CR_index + (idx>>4)) & 0x7
- CR_reg = CR{CR_col} # get 4 bit CRn
- # same as for v3.0/v3.1 from this point onwards
- bit_index = 3-(BA & 0b11) # low 2 bits but BE
- # finally get the bit from the CR.
- CR_bit = (CR_reg & (1<<bit_index)) != 0
+algorithm to determin CR\_reg is modified to as follows:
+
+ CR_index = 7-(BA>>2) # top 3 bits but BE
+ if spec[2]:
+ # vector mode
+ CR_index = (CR_index<<3) | (spec[0:1] << 1)
+ else:
+ # scalar mode
+ CR_index = (spec[0:1]<<3) | CR_index
+ # same as for v3.0/v3.1 from this point onwards
+ bit_index = 3-(BA & 0b11) # low 2 bits but BE
+ CR_reg = CR{CR_index} # get the CR
+ # finally get the bit from the CR.
+ CR_bit = (CR_reg & (1<<bit_index)) != 0
Note here that the decoding pattern to determine CR\_bit does not change.
Note: high-performance implementations may read/write Vectors of CRs in
-batches of aligned 32-bit chunks. This is to greatly
+batches of aligned 32-bit chunks (CR0-7, CR7-15). This is to greatly
simplify internal design. If instructions are issued where CR Vectors
do not start on a 32-bit aligned boundary, performance may be affected.