# Condition Register SVP64 Operations Links: * * [[svp64]] Condition Register Fields are only 4 bits wide: this presents some interesting conceptual challenges for SVP64, particularly with respect to element width (which is clearly meaningless for a 4-bit collation of Conditions, EQ LT GE SO). Likewise, arithmetic saturation (an important part of Arithmetic SVP64) has no meaning. Additionally, extra modes are required that only make sense for Vectorised CR Operations. Consequently an alternative Mode Format is required. This alternative mapping **only** applies to instructions that **only** reference a CR Field or CR bit as the sole exclusive result. This section **does not** apply to instructions which primarily produce arithmetic results that also, as an aside, produce a corresponding CR Field (such as when Rc=1). Instructions that involve Rc=1 are definitively arithmetic in nature, where the corresponding Condition Register Field can be considered to be a "co-result". Such CR Field "co-result" arithmeric operations are firmly out of scope for this section. * Examples of v3.0B instructions to which this section does apply is `mfcr` (3 bit operands) and `crnor` (5 bit operands). * Examples to which this section does **not** apply include `fadds.` and `subf.` which both produce arithmetic results (and a CR Field co-result). Other modes are still applicable and include: * **Data-dependent fail-first**. useful to truncate VL based on analysis of a Condition Register result bit. * **Scalar and parallel reduction**. Reduction is useful for turning a Vector of Condition Register Fields into one single Condition Register. * **Predicate-result**. Equivalent to python "filter", in that only elements which pass a test will end up actually being modified. This is in effect the same as ANDing the Condition Test with the destination predicate mask (hence the name, "predicate-result"). Predicate-result is a particularly powerful strategic mode in that it is the interaction of a source predicate, destination predicate, input operands *and* the output result, all combining to influence what actually goes into the Condition Register File. Given that predicates may themselves be Condition Registers it can be seen that there could potentially be up to **six** CR Fields involved in the execution of Predicate-result Mode. SVP64 RM `MODE` (includes `ELWIDTH` bits) for CR-based operations: | 4 | 5 | 19-20 | 21 | 22 23 | description | | - | - | ----- | --- |---------|----------------- | | / | / | 00 | 0 | dz sz | normal mode | | / | / | 00 | 1 | 0 RG | scalar reduce mode (mapreduce), SUBVL=1 | | / | / | 00 | 1 | 1 CRM | parallel reduce mode (mapreduce), SUBVL=1 | | / | / | 00 | 1 | SVM RG | subvector reduce mode, SUBVL>1 | |dz |VLi| 01 | inv | CR-bit | Ffirst 3-bit mode | |sz |VLi| 01 | inv | dz Rc1 | Ffirst 5-bit mode | | / | / | 10 | / | / / | RESERVED | |sz |SNZ| 11 | inv | CR-bit | 3-bit pred-result CR sel | | / |SNZ| 11 | inv | dz sz | 5-bit pred-result z/nonz | Fields: TODO # Data-dependent fail-first on CR operations The principle of data-dependent fail-first is that if a Condition Test fails then VL (Vector Length) is truncated at that point. In the case of Arithmetic SVP64 Operations the Condition Register Field generated from Rc=1 is used, however with CR-based operations that CR result is provided by the operation itself. Data-dependent SVP64 Vectorised Operations involving the creation or modification of a CR can require an extra two bits, which are not available in the compact space of the SVP64 RM `MODE` Field. With the concept of element width overrides being meaningless for CR Fields it is possible to use the `ELWIDTH` field for alternative purposes. Condition Register based operations such as `sv.mfcr` and `sv.crand` can thus be made more flexible. However the rules that apply in this section also apply to future CR-based instructions. There are two primary different types of CR operations: * Those which have a 3-bit operand field (referring to a CR Field) * Those which have a 5-bit operand (referring to a bit within the whole 32-bit CR) Examining these two types it is observed that the difference may be considered to be that the 5-bit variant provides additional information about which CR Field bit (EQ, GE, LT, SO) is to be operated on by the instruction. Thus, logically, we may set the following rule: * When a 5-bit CR Result field is used in an instruction, the 5-bit variant of Data-Dependent Fail-First must be used. i.e. the bit of the CR field to be tested is the one that has just been modified (created) by the operation. * When a 3-bit CR Result field is used the 3-bit variant must be used, providing as it does the missing `CRbit` in order to select which CR Field bit of the result shall be tested (EQ, LE, GE, SO) The reason why the 3-bit CR variant needs the additional CR-bit field should be obvious from the fact that the 3-bit CR Field from the base Power ISA v3.0B operation clearly does not contain and is missing the two CR Field Selector bits. Thus, these two bits (to select EQ, LE, GE or SO) must be provided in another way. Examples of the former type: * crand, cror, crnor. These all are 5-bit (BA, BB, BT). The bit to be tested against `inv` is the one selected by `BT` * mcrf. This has only 3-bit (BF, BFA). In order to select the bit to be tested, the alternative encoding must be used. With `CRbit` coming from the SVP64 RM bits 22-23 the bit of BF to be tested is identified. # Predicate-result Condition Register operations These are again slightly different compared to SVP64 arithmetic pred-result (described in [[svp64/appendix]]). The reason is that, again, for arithmetic operations the production of a CR Field when Rc=1 is a *co-result* accompanying the main arithmetic result, whereas for CR-based operations the CR Field (referred to by a 3-bit v3.0B base operand from e.g. `mfcr`) or CR bit (referred to by a 5-bit operand from e.g. `crnor`) *is* itself the explicit and sole result of the operation. Therefore, logically, Predicate-result needs to be adapted to test the actual result of the CR-based instruction (rather than test the co-resultant CR when Rc=1, as is done for Arithmetic SVP64). for i in range(VL): # predication test, skip all masked out elements. # skips when sz=0 if sz=0 and predicate_masked_out(i): continue if predicate_masked_out(i): if 5bit mode: # only one bit of CR to update result = SNZ else # four copies of SNZ result = SNZ || SNZ || SNZ || SNZ else # result is to go into CR. may be a 4-bit CR Field # (3-bit mode) or just a single bit (5-bit mode) result = op(...) if 5bit mode: # if this CR op has 5-bit CR result operands # the single bit result is what must be tested to_test = result else # if however this is a 3-bit CR *field* result # then the bit to be tested must be selected to_test = result[CRbit] # now test CR, similar to branch if to_test != inv: continue # test failed: cancel store # result optionally stored update_CR(result)