From f95866fd04925aacf6c05ca1262e10784707da0f Mon Sep 17 00:00:00 2001 From: lkcl Date: Mon, 26 Oct 2020 17:02:57 +0000 Subject: [PATCH] --- openpower/sv/predication.mdwn | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/openpower/sv/predication.mdwn b/openpower/sv/predication.mdwn index f40b2e74f..3360052b8 100644 --- a/openpower/sv/predication.mdwn +++ b/openpower/sv/predication.mdwn @@ -22,7 +22,7 @@ * two modes, "zeroing" and "non-zeroing". zeroing mode places a zero in the masked-out element results, where non-zeroing leaves the destination (result) element unmodified. * predicate must be invertable via an opcode bit (to avoid the need for an instruction which inverts all bits of the predicate mask) -Implementation note: even in in-order microarchitectures it is strongly adviseable to use byte-level write-enable lines on the register file. This in combination with 8b-bit SIMD element overrides allows, in "non-zeroing" mode, the predicate mask to be directly ANDed with the regfile write-enable lines to achieve the required functionality. The alternative is to perform a READ-MODIFY-MASK-WRITE cycle which is costly and compromises performance. Avoided very simply with byte-level write-enable. +Implementation note: even in in-order microarchitectures it is strongly adviseable to use byte-level write-enable lines on the register file. This in combination with 8-bit SIMD element overrides allows, in "non-zeroing" mode, the predicate mask to be directly ANDed with the regfile write-enable lines to achieve the required functionality. The alternative is to perform a READ-MODIFY-MASK-WRITE cycle which is costly and compromises performance. Avoided very simply with byte-level write-enable. # Proposals @@ -60,7 +60,7 @@ datapath to the relevant FUs. This could be reduced by adding yet another type of special virtual register port or datapath that masks out the required predicate bits closer to the regfile. -another disadvantage is that the CR regfile needs to be expanded from 8x 4bit CRs to a minimum of 64x or preferably 128x 4-bit CRs. Beyond that rhey can be transferred using vectirised mfcr and mtcrf into INT regs. this is a huge number of CR regs, each of which will need a DM column in the FU-REGs Matrix. however this cost can be mitigated through regfile cacheing, bringing FU-REGs column numbers back down to "sane". +another disadvantage is that the CR regfile needs to be expanded from 8x 4bit CRs to a minimum of 64x or preferably 128x 4-bit CRs. Beyond that they can be transferred using vectorised mfcr and mtcrf into INT regs. this is a huge number of CR regs, each of which will need a DM column in the FU-REGs Matrix. however this cost can be mitigated through regfile cacheing, bringing FU-REGs column numbers back down to "sane". ### Predicated SIMD HI32-LO32 FUs -- 2.30.2