* two modes, "zeroing" and "non-zeroing". zeroing mode places a zero in the masked-out element results, where non-zeroing leaves the destination (result) element unmodified.
* predicate must be invertable via an opcode bit (to avoid the need for an instruction which inverts all bits of the predicate mask)
-Implementation note: even in in-order microarchitectures it is strongly adviseable to use byte-level write-enable lines on the register file. This in combination with 8b-bit SIMD element overrides allows, in "non-zeroing" mode, the predicate mask to be directly ANDed with the regfile write-enable lines to achieve the required functionality. The alternative is to perform a READ-MODIFY-MASK-WRITE cycle which is costly and compromises performance. Avoided very simply with byte-level write-enable.
+Implementation note: even in in-order microarchitectures it is strongly adviseable to use byte-level write-enable lines on the register file. This in combination with 8-bit SIMD element overrides allows, in "non-zeroing" mode, the predicate mask to be directly ANDed with the regfile write-enable lines to achieve the required functionality. The alternative is to perform a READ-MODIFY-MASK-WRITE cycle which is costly and compromises performance. Avoided very simply with byte-level write-enable.
# Proposals
type of special virtual register port or datapath that masks out the
required predicate bits closer to the regfile.
-another disadvantage is that the CR regfile needs to be expanded from 8x 4bit CRs to a minimum of 64x or preferably 128x 4-bit CRs. Beyond that rhey can be transferred using vectirised mfcr and mtcrf into INT regs. this is a huge number of CR regs, each of which will need a DM column in the FU-REGs Matrix. however this cost can be mitigated through regfile cacheing, bringing FU-REGs column numbers back down to "sane".
+another disadvantage is that the CR regfile needs to be expanded from 8x 4bit CRs to a minimum of 64x or preferably 128x 4-bit CRs. Beyond that they can be transferred using vectorised mfcr and mtcrf into INT regs. this is a huge number of CR regs, each of which will need a DM column in the FU-REGs Matrix. however this cost can be mitigated through regfile cacheing, bringing FU-REGs column numbers back down to "sane".
### Predicated SIMD HI32-LO32 FUs