\newpage{}
-# Condition Register SVP64 Operations
-
-Condition Register Fields are only 4 bits wide: this presents some
-interesting conceptual challenges for SVP64, which was designed
-primarily for vectors of arithmetic and logical operations. However
-if predicates may be bits of CR Fields it makes sense to extend
-Simple-V to cover CR Operations, especially given that Vectorised Rc=1
-may be processed by Vectorised CR Operations tbat usefully in turn
-may become Predicate Masks to yet more Vector operations, like so:
-
-```
- sv.cmpi/ew=8 *B,*ra,0 # compare bytes against zero
- sv.cmpi/ew=8 *B2,*ra,13. # and against newline
- sv.cror PM.EQ,B.EQ,B2.EQ # OR compares to create mask
- sv.stb/sm=EQ ... # store only nonzero/newline
-```
-
-Element width however is clearly meaningless for a 4-bit collation of
-Conditions, EQ LT GE SO. Likewise, arithmetic saturation (an important
-part of Arithmetic SVP64) has no meaning. An alternative Mode Format is
-required, and given that elwidths are meaningless for CR Fields the bits
-in SVP64 `RM` may be used for other purposes.
-
-This alternative mapping **only** applies to instructions that **only**
-reference a CR Field or CR bit as the sole exclusive result. This section
-**does not** apply to instructions which primarily produce arithmetic
-results that also, as an aside, produce a corresponding CR Field (such as
-when Rc=1). Instructions that involve Rc=1 are definitively arithmetic
-in nature, where the corresponding Condition Register Field can be
-considered to be a "co-result". Such CR Field "co-result" arithmeric
-operations are firmly out of scope for this section, being covered fully
-by [[sv/normal]].
-
-* Examples of v3.0B instructions to which this section does
- apply is
- - `mfcr` and `cmpi` (3 bit operands) and
- - `crnor` and `crand` (5 bit operands).
-* Examples to which this section does **not** apply include
- `fadds.` and `subf.` which both produce arithmetic results
- (and a CR Field co-result).
-
-The CR Mode Format still applies to `sv.cmpi` because despite
-taking a GPR as input, the output from the Base Scalar v3.0B `cmpi`
-instruction is purely to a Condition Register Field.
-
-Other modes are still applicable and include:
-
-* **Data-dependent fail-first**.
- useful to truncate VL based on analysis of a Condition Register result bit.
-* **Reduction**.
- Reduction is useful for analysing a Vector of Condition Register Fields
- and reducing it to one single Condition Register Field.
-
-Predicate-result does not make any sense because when Rc=1 a co-result
-is created (a CR Field). Testing the co-result allows the decision to
-be made to store or not store the main result, and for CR Ops the CR
-Field result *is* the main result.
-
-## Format
-
-SVP64 RM `MODE` (includes `ELWIDTH_SRC` bits) for CR-based operations:
-
-|6 | 7 |19-20| 21 | 22 23 | description |
-|--|---|-----| --- |---------|----------------- |
-|/ | / |0 RG | 0 | dz sz | simple mode |
-|/ | / |0 RG | 1 | dz sz | scalar reduce mode (mapreduce) |
-|zz|SNZ|1 VLI| inv | CR-bit | Ffirst 3-bit mode |
-|/ |SNZ|1 VLI| inv | dz sz | Ffirst 5-bit mode (implies CR-bit from result) |
-
-Fields:
-
-* **sz / dz** if predication is enabled will put zeros into the dest
- (or as src in the case of twin pred) when the predicate bit is zero.
- otherwise the element is ignored or skipped, depending on context.
-* **zz** set both sz and dz equal to this flag
-* **SNZ** In fail-first mode, on the bit being tested, when sz=1 and
- SNZ=1 a value "1" is put in place of "0".
-* **inv CR-bit** just as in branches (BO) these bits allow testing of
- a CR bit and whether it is set (inv=0) or unset (inv=1)
-* **RG** inverts the Vector Loop order (VL-1 downto 0) rather
- than the normal 0..VL-1
-* **SVM** sets "subvector" reduce mode
-* **VLi** VL inclusive: in fail-first mode, the truncation of
- VL *includes* the current element at the failure point rather
- than excludes it from the count.
-
-## Data-dependent fail-first on CR operations
-
-The principle of data-dependent fail-first is that if, during the course
-of sequentially evaluating an element's Condition Test, one such test
-is encountered which fails, then VL (Vector Length) is truncated (set)
-at that point. In the case of Arithmetic SVP64 Operations the Condition
-Register Field generated from Rc=1 is used as the basis for the truncation
-decision. However with CR-based operations that CR Field result to be
-tested is provided *by the operation itself*.
-
-Data-dependent SVP64 Vectorised Operations involving the creation
-or modification of a CR can require an extra two bits, which are not
-available in the compact space of the SVP64 RM `MODE` Field. With the
-concept of element width overrides being meaningless for CR Fields it
-is possible to use the `ELWIDTH` field for alternative purposes.
-
-Condition Register based operations such as `sv.mfcr` and `sv.crand`
-can thus be made more flexible. However the rules that apply in this
-section also apply to future CR-based instructions.
-
-There are two primary different types of CR operations:
-
-* Those which have a 3-bit operand field (referring to a CR Field)
-* Those which have a 5-bit operand (referring to a bit within the
- whole 32-bit CR)
-
-Examining these two types it is observed that the difference may
-be considered to be that the 5-bit variant *already* provides the
-prerequisite information about which CR Field bit (EQ, GE, LT, SO) is
-to be operated on by the instruction. Thus, logically, we may set the
-following rule:
-
-* When a 5-bit CR Result field is used in an instruction, the
- 5-bit variant of Data-Dependent Fail-First
- must be used. i.e. the bit of the CR field to be tested is
- the one that has just been modified (created) by the operation.
-* When a 3-bit CR Result field is used the 3-bit variant
- must be used, providing as it does the missing `CRbit` field
- in order to select which CR Field bit of the result shall
- be tested (EQ, LE, GE, SO)
-
-The reason why the 3-bit CR variant needs the additional CR-bit field
-should be obvious from the fact that the 3-bit CR Field from the base
-Power ISA v3.0B operation clearly does not contain and is missing the
-two CR Field Selector bits. Thus, these two bits (to select EQ, LE,
-GE or SO) must be provided in another way.
-
-Examples of the former type:
-
-* crand, cror, crnor. These all are 5-bit (BA, BB, BT). The bit
- to be tested against `inv` is the one selected by `BT`
-* mcrf. This has only 3-bit (BF, BFA). In order to select the
- bit to be tested, the alternative encoding must be used.
- With `CRbit` coming from the SVP64 RM bits 22-23 the bit
- of BF to be tested is identified.
-
-Just as with SVP64 [[sv/branches]] there is the option to truncate
-VL to include the element being tested (`VLi=1`) and to exclude it
-(`VLi=0`).
-
-Also exactly as with [[sv/normal]] fail-first, VL cannot, unlike
-[[sv/ldst]], be set to an arbitrary value. Deterministic behaviour
-is *required*.
-
-## Reduction and Iteration
-
-Bearing in mind as described in the svp64 Appendix, SVP64 Horizontal
-Reduction is a deterministic schedule on top of base Scalar v3.0
-operations, the same rules apply to CR Operations, i.e. that programmers
-must follow certain conventions in order for an *end result* of a
-reduction to be achieved. Unlike other Vector ISAs *there are no explicit
-reduction opcodes* in SVP64: Schedules however achieve the same effect.
-
-Due to these conventions only reduction on operations such as `crand`
-and `cror` are meaningful because these have Condition Register Fields
-as both input and output. Meaningless operations are not prohibited
-because the cost in hardware of doing so is prohibitive, but neither
-are they `UNDEFINED`. Implementations are still required to execute them
-but are at liberty to optimise out any operations that would ultimately
-be overwritten, as long as Strict Program Order is still obvservable by
-the programmer.
-
-Also bear in mind that 'Reverse Gear' may be enabled, which can be
-used in combination with overlapping CR operations to iteratively
-accumulate results. Issuing a `sv.crand` operation for example with
-`BA` differing from `BB` by one Condition Register Field would result
-in a cascade effect, where the first-encountered CR Field would set the
-result to zero, and also all subsequent CR Field elements thereafter:
-
-```
- # sv.crand/mr/rg CR4.ge.v, CR5.ge.v, CR4.ge.v
- for i in VL-1 downto 0 # reverse gear
- CR.field[4+i].ge &= CR.field[5+i].ge
-```
-
-`sv.crxor` with reduction would be particularly useful for parity
-calculation for example, although there are many ways in which the same
-calculation could be carried out after transferring a vector of CR Fields
-to a GPR using crweird operations.
-
-Implementations are free and clear to optimise these reductions in any way
-they see fit, as long as the end-result is compatible with Strict Program
-Order being observed, and Interrupt latency is not adversely impacted.
-
-## Unusual and quirky CR operations
-
-**cmp and other compare ops**
-
-`cmp` and `cmpi` etc take GPRs as sources and create a CR Field as a result.
-
- cmpli BF,L,RA,UI
- cmpeqb BF,RA,RB
-
-With `ELWIDTH` applying to the source GPR operands this is perfectly fine.
-
-**crweird operations**
-
-There are 4 weird CR-GPR operations and one reasonable one in
-the [[cr_int_predication]] set:
-
-* crrweird
-* mtcrweird
-* crweirder
-* crweird
-* mcrfm - reasonably normal and referring to CR Fields for src and dest.
-
-The "weird" operations have a non-standard behaviour, being able to
-treat *individual bits* of a GPR effectively as elements. They are
-expected to be Micro-coded by most Hardware implementations.
-
[[!tag opf_rfc]]
---------
-
-\newpage{}
-