From: Luke Kenneth Casson Leighton Date: Wed, 6 Apr 2022 12:22:30 +0000 (+0100) Subject: whitespace cleanup X-Git-Tag: opf_rfc_ls005_v1~2874 X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=8ef4d86611cb25d4bea5a4ca37ca367f64799357;p=libreriscv.git whitespace cleanup --- diff --git a/openpower/sv/cr_int_predication.mdwn b/openpower/sv/cr_int_predication.mdwn index 83c6796bd..646b382f2 100644 --- a/openpower/sv/cr_int_predication.mdwn +++ b/openpower/sv/cr_int_predication.mdwn @@ -14,23 +14,33 @@ See: Rationale: -Condition Registers are conceptually perfect for use as predicate masks, the only problem being that typical Vector ISAs have quite comprehensive mask-based instructions: set-before-first, popcount and much more. In fact many Vector ISAs can use Vectors *as* masks, consequently the entire Vector ISA is usually available for use in creating masks (one exception being AVX512 which -has a dedicated Mask regfile and opcodes). -Duplication of such operations (popcount etc) is not practical for SV given -the strategy of leveraging pre-existing Scalar instructions in a minimalist way. - -With the scalar OpenPOWER v3.0B ISA having already popcnt, cntlz and others normally seen in Vector Mask operations it makes sense to allow *both* scalar integers *and* CR-Vectors to be predicate masks. That in turn means that much more comprehensive interaction between CRs and scalar Integers is required, because with the CR Predication Modes designating CR *Fields* -(not CR bits) as Predicate Elements, fast transfers between CR *Fields* and -the Integer Register File is needed. - -The opportunity is therefore taken to also augment CR logical arithmetic as well, using a mask-based paradigm that takes into consideration multiple bits of each CR Field (eq/lt/gt/ov). By contrast -v3.0B Scalar CR instructions (crand, crxor) only allow a single bit calculation, and both mtcr and mfcr are CR-orientated rather than CR *Field* -orientated. +Condition Registers are conceptually perfect for use as predicate masks, +the only problem being that typical Vector ISAs have quite comprehensive +mask-based instructions: set-before-first, popcount and much more. +In fact many Vector ISAs can use Vectors *as* masks, consequently the +entire Vector ISA is usually available for use in creating masks (one +exception being AVX512 which has a dedicated Mask regfile and opcodes). +Duplication of such operations (popcount etc) is not practical for SV +given the strategy of leveraging pre-existing Scalar instructions in a +minimalist way. + +With the scalar OpenPOWER v3.0B ISA having already popcnt, cntlz and +others normally seen in Vector Mask operations it makes sense to allow +*both* scalar integers *and* CR-Vectors to be predicate masks. That in +turn means that much more comprehensive interaction between CRs and scalar +Integers is required, because with the CR Predication Modes designating +CR *Fields* (not CR bits) as Predicate Elements, fast transfers between +CR *Fields* and the Integer Register File is needed. + +The opportunity is therefore taken to also augment CR logical arithmetic +as well, using a mask-based paradigm that takes into consideration +multiple bits of each CR Field (eq/lt/gt/ov). By contrast v3.0B Scalar +CR instructions (crand, crxor) only allow a single bit calculation, and +both mtcr and mfcr are CR-orientated rather than CR *Field* orientated. Also strangely there is no v3.0 instruction for directly moving CR Fields, -only CR *bits*, -so that is corrected here with `mcrfm`. The opportunity is taken -to allow inversion of CR Field bits, when copied. +only CR *bits*, so that is corrected here with `mcrfm`. The opportunity +is taken to allow inversion of CR Field bits, when copied. Basic concept: @@ -57,10 +67,14 @@ Side-effects: (Twin) Predication interactions: -* INT twin predication with zeroing is a way to copy an integer into CRs without necessarily needing the INT register (RA). if it is, it is effectively ANDed (or negate-and-ANDed) with the INT Predicate -* CR twin predication with zeroing is likewise a way to interact with the incoming integer +* INT twin predication with zeroing is a way to copy an integer into + CRs without necessarily needing the INT register (RA). if it is, it is + effectively ANDed (or negate-and-ANDed) with the INT Predicate +* CR twin predication with zeroing is likewise a way to interact with + the incoming integer -this gets particularly powerful if data-dependent predication is also enabled. further explanation is below. +this gets particularly powerful if data-dependent predication is also +enabled. further explanation is below. # Bit ordering. @@ -99,8 +113,9 @@ bit 19=0, bit 20=0 If Rc: CR0 = analyse(RT) -When used with SVP64 Prefixing this is a [[openpower/sv/normal]] SVP64 type operation and as -such can use Rc=1 and RC1 Data-dependent Mode capability +When used with SVP64 Prefixing this is a [[openpower/sv/normal]] +SVP64 type operation and as such can use Rc=1 and RC1 Data-dependent +Mode capability **mtcrweird** @@ -125,9 +140,9 @@ M=1. Correspondingly when M=0 this operation is an overwrite: no read of BF is required because the masked-out bits of the BF CR Field are set to zero. -When used with SVP64 Prefixing this is a [[openpower/sv/cr_ops]] SVP64 type operation that has -3-bit Data-dependent and 3-bit Predicate-result capability -(BF is 3 bits) +When used with SVP64 Prefixing this is a [[openpower/sv/cr_ops]] SVP64 +type operation that has 3-bit Data-dependent and 3-bit Predicate-result +capability (BF is 3 bits) **crweird** @@ -151,9 +166,9 @@ M=1. Correspondingly when M=0 this operation is an overwrite: no read of BF is required because the masked-out bits of the BF CR Field are set to zero. -When used with SVP64 Prefixing this is a [[openpower/sv/cr_ops]] SVP64 type operation that has -3-bit Data-dependent and 3-bit Predicate-result capability -(BF is 3 bits) +When used with SVP64 Prefixing this is a [[openpower/sv/cr_ops]] SVP64 +type operation that has 3-bit Data-dependent and 3-bit Predicate-result +capability (BF is 3 bits) **mcrfm** - Move CR Field, masked. @@ -173,15 +188,15 @@ M=1. Correspondingly when M=0 this operation is an overwrite: no read of BF is required because the masked-out bits of the BF CR Field are set to zero. -When used with SVP64 Prefixing this is a [[openpower/sv/cr_ops]] SVP64 type operation that has -3-bit Data-dependent and 3-bit Predicate-result capability -(BF is 3 bits) +When used with SVP64 Prefixing this is a [[openpower/sv/cr_ops]] SVP64 +type operation that has 3-bit Data-dependent and 3-bit Predicate-result +capability (BF is 3 bits) -*Programmer's note: `mode` being XORed onto the result provides considerable -flexibility. individual bits of BFA may be copied inverted to BF by -ensuring that `mask` and `mode` have the same bit set. Also, individual -bits in BF may be set to 1 by ensuring that the required bit of `mask` -is set to zero and the same bit in `mode` is set to 1* +*Programmer's note: `mode` being XORed onto the result provides +considerable flexibility. individual bits of BFA may be copied inverted +to BF by ensuring that `mask` and `mode` have the same bit set. Also, +individual bits in BF may be set to 1 by ensuring that the required bit of +`mask` is set to zero and the same bit in `mode` is set to 1* **crweirder** @@ -199,9 +214,9 @@ bit 19=1, bit 20=1 result = n0|n1|n2|n3 if M else n0&n1&n2&n3 CR{BF}[bit] = result -When used with SVP64 Prefixing this is a [[openpower/sv/cr_ops]] SVP64 type operation that has -5-bit Data-dependent and 5-bit Predicate-result capability -(BFT is 5 bits) +When used with SVP64 Prefixing this is a [[openpower/sv/cr_ops]] SVP64 +type operation that has 5-bit Data-dependent and 5-bit Predicate-result +capability (BFT is 5 bits) **Example Pseudo-ops:** @@ -211,18 +226,21 @@ When used with SVP64 Prefixing this is a [[openpower/sv/cr_ops]] SVP64 type oper # Vectorised versions -The name "weird" refers to a minor violation of SV rules when it comes to deriving the Vectorised versions of these instructions. +The name "weird" refers to a minor violation of SV rules when it comes +to deriving the Vectorised versions of these instructions. -Normally the progression of the SV for-loop would move on to the next register. -Instead however in the scalar case these instructions **remain in the same register** and insert or transfer between **bits** of the scalar integer source or destination. +Normally the progression of the SV for-loop would move on to the +next register. Instead however in the scalar case these instructions +**remain in the same register** and insert or transfer between **bits** +of the scalar integer source or destination. Further useful violation of the normal SV Elwidth override rules allows -for packing (or unpacking) of multiple CR test results into -(or out of) an Integer Element. Note -that the CR (source operand) elwidth field is utilised to determine the bit- -packing size (1/2/4/8 with remaining bits within the Integer element -set to zero) whilst the INT (dest operand) elwidth field still sets -the Integer element size as usual (8/16/32/default) +for packing (or unpacking) of multiple CR test results into (or out of) +an Integer Element. Note that the CR (source operand) elwidth field is +utilised to determine the bit- packing size (1/2/4/8 with remaining +bits within the Integer element set to zero) whilst the INT (dest +operand) elwidth field still sets the Integer element size as usual +(8/16/32/default) crrweird: RT, BB, mask.mode @@ -271,11 +289,13 @@ Note that: # v3.1 setbc instructions -there are additional setb conditional instructions in v3.1 (p129) +There are additional setb conditional instructions in v3.1 (p129) RT = (CR[BI] == 1) ? 1 : 0 -which also negate that, and also return -1 / 0. these are similar to crweird but not the same purpose. most notable is that crweird acts on CR fields rather than the entire 32 bit CR. +which also negate that, and also return -1 / 0. these are similar to +crweird but not the same purpose. most notable is that crweird acts on +CR fields rather than the entire 32 bit CR. # Predication Examples @@ -284,11 +304,10 @@ Take the following example: r10 = 0b00010 sv.mtcrweird/dm=r10/dz cr8.v, 0, 0b0011.0000 -Here, RA is zero, so the source input is zero. The destination -is CR Field 8, and the destination predicate mask indicates -to target the first two elements. Destination predicate zeroing is -enabled, and the destination predicate is only set in the 2nd bit. -mask is 0b0011, mode is all zeros. +Here, RA is zero, so the source input is zero. The destination is CR Field +8, and the destination predicate mask indicates to target the first two +elements. Destination predicate zeroing is enabled, and the destination +predicate is only set in the 2nd bit. mask is 0b0011, mode is all zeros. Let us first consider what should go into element 0 (CR Field 8): @@ -306,11 +325,11 @@ Now the second element, CR Field 9 (CR9): * Therefore, CR9 is set (using LSB0 ordering) to 0b0011, i.e. to mask. It should be clear that this instruction uses bits of the integer -predicate to decide whether to set CR Fields to `(mask & ~mode)` -or to zero. Thus, in effect, it is the integer predicate that has -been copied into the CR Fields. - -By using twin predication, zeroing, and inversion (sm=~r3, dm=r10) for example, it becomes possible to combine two Integers together in -order to set bits in CR Fields. -Likewise there are dozens of ways that CR Predicates can be used, on the -same sv.mtcrweird instruction. +predicate to decide whether to set CR Fields to `(mask & ~mode)` or +to zero. Thus, in effect, it is the integer predicate that has been +copied into the CR Fields. + +By using twin predication, zeroing, and inversion (sm=~r3, dm=r10) for +example, it becomes possible to combine two Integers together in order +to set bits in CR Fields. Likewise there are dozens of ways that CR +Predicates can be used, on the same sv.mtcrweird instruction.