# Condition Register SVP64 Operations

Links:

* <https://bugs.libre-soc.org/show_bug.cgi?id=687>
* [[svp64]]

Condition Register Fields are only 4 bits wide: this presents some
interesting conceptual challenges for SVP64, particularly with respect to element
width (which is clearly meaningless for a 4-bit
collation of Conditions, EQ LT GE SO). Likewise, arithmetic saturation
(an important part of Arithmetic SVP64)
has no meaning. Additionally, extra modes are required that only make
sense for Vectorised CR Operations. Consequently an alternative Mode Format is required.

This alternative mapping **only** applies to instructions that **only**
reference a CR Field or CR bit as the sole exclusive result. This section
**does not** apply to instructions which primarily produce arithmetic
results that also, as an aside, produce a corresponding
CR Field (such as when Rc=1).
Instructions that involve Rc=1 are definitively arithmetic in nature,
where the corresponding Condition Register Field can be considered to
be a "co-result". Such CR Field "co-result" arithmeric operations
are firmly out of scope for
this section.

* Examples of v3.0B instructions to which this section does
  apply is `mfcr` (3 bit operands) and `crnor` (5 bit operands).
* Examples to which this section does **not** apply include
  `fadds.` and `subf.` which both produce arithmetic results
  (and a CR Field co-result).

Other modes are still applicable and include:

* **Data-dependent fail-first**.
  useful to truncate VL based on
  analysis of a Condition Register result bit.
* **Scalar and parallel reduction**.
  Reduction is useful
for turning a Vector of Condition Register Fields into one
single Condition Register. 
* **Predicate-result**.
  Equivalent
to python "filter", in that only elements which pass a test
will end up actually being modified. This is in effect the same
as ANDing the Condition Test with the destination predicate
mask (hence the name, "predicate-result").

Predicate-result is a particularly powerful strategic mode
in that it is the interaction of a source predicate, destination predicate,
input operands *and* the output result, all combining to influence
what actually goes into the Condition Register File. Given that
predicates may themselves be Condition Registers it can be seen that
there could potentially be up to **six** CR Fields involved in
the execution of Predicate-result Mode.

SVP64 RM `MODE` (includes `ELWIDTH` bits) for CR-based operations:

| 4 | 5 | 19-20 |  21 | 22   23 |  description     |
| - | - | ----- | --- |---------|----------------- |
| / | / | 00    |   0 |  dz  sz | normal mode                      |
| / | / | 00    |   1 | 0  RG   | scalar reduce mode (mapreduce), SUBVL=1 |
| / | / | 00    |   1 | 1  CRM  | parallel reduce mode (mapreduce), SUBVL=1 |
| / | / | 00    |   1 | SVM RG  | subvector reduce mode, SUBVL>1   |
|dz |VLi| 01    | inv |  CR-bit | Ffirst 3-bit mode      |
|sz |VLi| 01    | inv |  dz Rc1 | Ffirst 5-bit mode       |
| / | / | 10    |   / | /   /   |  RESERVED |
|sz |SNZ| 11    | inv | CR-bit  |  3-bit pred-result CR sel |
| / |SNZ| 11    | inv | dz  sz  |  5-bit pred-result z/nonz |

Fields:

TODO

# Data-dependent fail-first on CR operations

The principle of data-dependent fail-first is that if a Condition Test
fails then VL (Vector Length) is truncated at that point. In the case
of Arithmetic SVP64 Operations the Condition Register Field generated from
Rc=1 is used, however with CR-based operations that CR result is provided
by the operation itself.

Data-dependent SVP64 Vectorised Operations involving the creation or
modification of a CR can require an extra two bits, which are not available
in the compact space of the SVP64 RM `MODE` Field. With the concept of element
width overrides being meaningless for CR Fields it is possible to use the
`ELWIDTH` field for alternative purposes.

Condition Register based operations such as `sv.mfcr` and `sv.crand` can thus
be made more flexible.  However the rules that apply in this section
also apply to future CR-based instructions.

There are two primary different types of CR operations:

* Those which have a 3-bit operand field (referring to a CR Field)
* Those which have a 5-bit operand (referring to a bit within the
   whole 32-bit CR)

Examining these two types it is observed that the
difference may be considered to be that the 5-bit variant provides
additional information about which CR Field bit (EQ, GE, LT, SO) is to
be operated on by the instruction.
Thus, logically, we may set the following rule:

* When a 5-bit CR Result field is used in an instruction, the
  5-bit variant of Data-Dependent Fail-First
  must be used. i.e. the bit of the CR field to be tested is
  the one that has just been modified (created) by the operation.
* When a 3-bit CR Result field is used the 3-bit variant
  must be used, providing as it does the missing `CRbit`
  in order to select which CR Field bit of the result shall
  be tested (EQ, LE, GE, SO)

The reason why the 3-bit CR variant needs the additional CR-bit
field should be obvious from the fact that the 3-bit CR Field
from the base Power ISA v3.0B operation clearly does not contain
and is missing the two CR Field Selector bits. Thus, these two
bits (to select EQ, LE, GE or SO) must be provided in another
way.

Examples of the former type:

* crand, cror, crnor. These all are 5-bit (BA, BB, BT). The bit
  to be tested against `inv` is the one selected by `BT`
* mcrf. This has only 3-bit (BF, BFA). In order to select the
  bit to be tested, the alternative encoding must be used.
  With `CRbit` coming from the SVP64 RM bits 22-23 the bit
  of BF to be tested is identified.

# Predicate-result Condition Register operations

These are again slightly different compared to SVP64 arithmetic
pred-result (described in [[svp64/appendix]]). The reason is that,
again, for arithmetic operations the production of a CR Field when
Rc=1 is a *co-result* accompanying the main arithmetic result, whereas
for CR-based operations the CR Field (referred to by a 3-bit
v3.0B base operand from e.g. `mfcr`) or CR bit (referred to by a 5-bit operand from e.g. `crnor`)
*is* itself the explicit and sole result of the operation.

Therefore, logically, Predicate-result needs to be adapted to
test the actual result of the CR-based instruction (rather than
test the co-resultant CR when Rc=1, as is done for Arithmetic SVP64).

    for i in range(VL):
        # predication test, skip all masked out elements.
        # skips when sz=0
        if sz=0 and predicate_masked_out(i):
             continue
        if predicate_masked_out(i):
           if 5bit mode:
              # only one bit of CR to update
              result = SNZ
           else
              # four copies of SNZ
              result = SNZ || SNZ || SNZ || SNZ
        else
           # result is to go into CR. may be a 4-bit CR Field
           # (3-bit mode) or just a single bit (5-bit mode)
           result = op(...)
        if 5bit mode:
           # if this CR op has 5-bit CR result operands
           # the single bit result is what must be tested
           to_test = result
        else
           # if however this is a 3-bit CR *field* result
           # then the bit to be tested must be selected
           to_test = result[CRbit]
        # now test CR, similar to branch
        if to_test != inv:
            continue # test failed: cancel store
        # result optionally stored
        update_CR(result)