From 299aeb11f4336e051c552faf123819f7faa9c64d Mon Sep 17 00:00:00 2001 From: lkcl Date: Mon, 26 Oct 2020 16:48:30 +0000 Subject: [PATCH] --- openpower/sv/predication.mdwn | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/openpower/sv/predication.mdwn b/openpower/sv/predication.mdwn index 7c0eaef0e..ba9b2441a 100644 --- a/openpower/sv/predication.mdwn +++ b/openpower/sv/predication.mdwn @@ -60,19 +60,6 @@ datapath to the relevant FUs. This could be reduced by adding yet another type of special virtual register port or datapath that masks out the required predicate bits closer to the regfile. -## One scalar int per predicate element. - -Similar to RVV and similar to the one-CR-per-element concept above, the idea here is to use the LSB of any given element in a vector of predicates. This idea has quite a lot of merit to it. - -Implementation-wise just like in the CR-based case a special regfile port could be added that gets the LSB of each scalar integer register and routes them through to the broadcast bus. - -The disadvantages appear on closer analysis: - -* Unlike the "full" CR port (which reads 8x CRs CR0-7 in one hit) trying the same trick on the scalar integer regfile, to obtain 8 predicate bits, would require a whopping 8x64bit set of reads to the INT regfile instead of a scant 1x32bit read. Resource-wise, then, this idea is expensive. -* With predicate bits being distributed out amongst 64 bit scalar registers, scalar bitmanipulation operations that can be performed after transferring Vectors of CMP operations from CRs to INTs (vectorised-mfcr) are more challenging and costly. Rather than use vectorised mfcr, complex transfers of the LSBs into a single scalar int are required. - -On balance this is a less favourable option than vectorising CRs - ### Predicated SIMD HI32-LO32 FUs an analysis of changing the element widths (for SIMD) gives the following @@ -134,6 +121,19 @@ bandwidth can again be reduced by performing the selection of the masks (bit 0 thru bit 3 of each CR) closer to the regfile i.e. before hitting the broadcast bus. +## One scalar int per predicate element. + +Similar to RVV and similar to the one-CR-per-element concept above, the idea here is to use the LSB of any given element in a vector of predicates. This idea has quite a lot of merit to it. + +Implementation-wise just like in the CR-based case a special regfile port could be added that gets the LSB of each scalar integer register and routes them through to the broadcast bus. + +The disadvantages appear on closer analysis: + +* Unlike the "full" CR port (which reads 8x CRs CR0-7 in one hit) trying the same trick on the scalar integer regfile, to obtain 8 predicate bits, would require a whopping 8x64bit set of reads to the INT regfile instead of a scant 1x32bit read. Resource-wise, then, this idea is expensive. +* With predicate bits being distributed out amongst 64 bit scalar registers, scalar bitmanipulation operations that can be performed after transferring Vectors of CMP operations from CRs to INTs (vectorised-mfcr) are more challenging and costly. Rather than use vectorised mfcr, complex transfers of the LSBs into a single scalar int are required. + +On balance this is a less favourable option than vectorising CRs + ## Scalar (single) integer as predicate, with one DM row This idea has several disadvantages. -- 2.30.2