From d1b6a7ba9f69fd0906906315f342f051128a0e3a Mon Sep 17 00:00:00 2001 From: Luke Kenneth Casson Leighton Date: Mon, 3 Apr 2023 11:22:56 +0100 Subject: [PATCH] remove cr_ops.mdwn copy from ls010.mdwn, include from pandoc --- openpower/sv/rfc/Makefile | 6 +- openpower/sv/rfc/ls010.mdwn | 220 ------------------------------------ 2 files changed, 5 insertions(+), 221 deletions(-) diff --git a/openpower/sv/rfc/Makefile b/openpower/sv/rfc/Makefile index bb3815458..55c08588b 100644 --- a/openpower/sv/rfc/Makefile +++ b/openpower/sv/rfc/Makefile @@ -1,6 +1,9 @@ all: ls001.pdf ls002.pdf ls003.pdf ls004.pdf ls005.pdf ls006.pdf ls007.pdf -ls010.pdf: ../svp64.mdwn ls010.mdwn ../ldst.mdwn ../branches.mdwn +LSO10_FILES = ../svp64.mdwn ls010.mdwn +LSO10_FILES += ../ldst.mdwn ../branches.mdwn ../cr_ops.mdwn + +ls010.pdf: $(LS010_FILES) cd ../.. && pandoc \ --filter pandoc_img.py \ -V margin-top=0.9in \ @@ -14,6 +17,7 @@ ls010.pdf: ../svp64.mdwn ls010.mdwn ../ldst.mdwn ../branches.mdwn sv/rfc/ls010.mdwn \ sv/ldst.mdwn \ sv/branches.mdwn \ + sv/cr_ops.mdwn \ -s --self-contained \ --mathjax \ -o sv/rfc/ls010.pdf diff --git a/openpower/sv/rfc/ls010.mdwn b/openpower/sv/rfc/ls010.mdwn index a0f70141f..8084ba104 100644 --- a/openpower/sv/rfc/ls010.mdwn +++ b/openpower/sv/rfc/ls010.mdwn @@ -294,226 +294,6 @@ different: elements that fail the CR test *or* are masked out are zero'd. \newpage{} -# Condition Register SVP64 Operations - -Condition Register Fields are only 4 bits wide: this presents some -interesting conceptual challenges for SVP64, which was designed -primarily for vectors of arithmetic and logical operations. However -if predicates may be bits of CR Fields it makes sense to extend -Simple-V to cover CR Operations, especially given that Vectorised Rc=1 -may be processed by Vectorised CR Operations tbat usefully in turn -may become Predicate Masks to yet more Vector operations, like so: - -``` - sv.cmpi/ew=8 *B,*ra,0 # compare bytes against zero - sv.cmpi/ew=8 *B2,*ra,13. # and against newline - sv.cror PM.EQ,B.EQ,B2.EQ # OR compares to create mask - sv.stb/sm=EQ ... # store only nonzero/newline -``` - -Element width however is clearly meaningless for a 4-bit collation of -Conditions, EQ LT GE SO. Likewise, arithmetic saturation (an important -part of Arithmetic SVP64) has no meaning. An alternative Mode Format is -required, and given that elwidths are meaningless for CR Fields the bits -in SVP64 `RM` may be used for other purposes. - -This alternative mapping **only** applies to instructions that **only** -reference a CR Field or CR bit as the sole exclusive result. This section -**does not** apply to instructions which primarily produce arithmetic -results that also, as an aside, produce a corresponding CR Field (such as -when Rc=1). Instructions that involve Rc=1 are definitively arithmetic -in nature, where the corresponding Condition Register Field can be -considered to be a "co-result". Such CR Field "co-result" arithmeric -operations are firmly out of scope for this section, being covered fully -by [[sv/normal]]. - -* Examples of v3.0B instructions to which this section does - apply is - - `mfcr` and `cmpi` (3 bit operands) and - - `crnor` and `crand` (5 bit operands). -* Examples to which this section does **not** apply include - `fadds.` and `subf.` which both produce arithmetic results - (and a CR Field co-result). - -The CR Mode Format still applies to `sv.cmpi` because despite -taking a GPR as input, the output from the Base Scalar v3.0B `cmpi` -instruction is purely to a Condition Register Field. - -Other modes are still applicable and include: - -* **Data-dependent fail-first**. - useful to truncate VL based on analysis of a Condition Register result bit. -* **Reduction**. - Reduction is useful for analysing a Vector of Condition Register Fields - and reducing it to one single Condition Register Field. - -Predicate-result does not make any sense because when Rc=1 a co-result -is created (a CR Field). Testing the co-result allows the decision to -be made to store or not store the main result, and for CR Ops the CR -Field result *is* the main result. - -## Format - -SVP64 RM `MODE` (includes `ELWIDTH_SRC` bits) for CR-based operations: - -|6 | 7 |19-20| 21 | 22 23 | description | -|--|---|-----| --- |---------|----------------- | -|/ | / |0 RG | 0 | dz sz | simple mode | -|/ | / |0 RG | 1 | dz sz | scalar reduce mode (mapreduce) | -|zz|SNZ|1 VLI| inv | CR-bit | Ffirst 3-bit mode | -|/ |SNZ|1 VLI| inv | dz sz | Ffirst 5-bit mode (implies CR-bit from result) | - -Fields: - -* **sz / dz** if predication is enabled will put zeros into the dest - (or as src in the case of twin pred) when the predicate bit is zero. - otherwise the element is ignored or skipped, depending on context. -* **zz** set both sz and dz equal to this flag -* **SNZ** In fail-first mode, on the bit being tested, when sz=1 and - SNZ=1 a value "1" is put in place of "0". -* **inv CR-bit** just as in branches (BO) these bits allow testing of - a CR bit and whether it is set (inv=0) or unset (inv=1) -* **RG** inverts the Vector Loop order (VL-1 downto 0) rather - than the normal 0..VL-1 -* **SVM** sets "subvector" reduce mode -* **VLi** VL inclusive: in fail-first mode, the truncation of - VL *includes* the current element at the failure point rather - than excludes it from the count. - -## Data-dependent fail-first on CR operations - -The principle of data-dependent fail-first is that if, during the course -of sequentially evaluating an element's Condition Test, one such test -is encountered which fails, then VL (Vector Length) is truncated (set) -at that point. In the case of Arithmetic SVP64 Operations the Condition -Register Field generated from Rc=1 is used as the basis for the truncation -decision. However with CR-based operations that CR Field result to be -tested is provided *by the operation itself*. - -Data-dependent SVP64 Vectorised Operations involving the creation -or modification of a CR can require an extra two bits, which are not -available in the compact space of the SVP64 RM `MODE` Field. With the -concept of element width overrides being meaningless for CR Fields it -is possible to use the `ELWIDTH` field for alternative purposes. - -Condition Register based operations such as `sv.mfcr` and `sv.crand` -can thus be made more flexible. However the rules that apply in this -section also apply to future CR-based instructions. - -There are two primary different types of CR operations: - -* Those which have a 3-bit operand field (referring to a CR Field) -* Those which have a 5-bit operand (referring to a bit within the - whole 32-bit CR) - -Examining these two types it is observed that the difference may -be considered to be that the 5-bit variant *already* provides the -prerequisite information about which CR Field bit (EQ, GE, LT, SO) is -to be operated on by the instruction. Thus, logically, we may set the -following rule: - -* When a 5-bit CR Result field is used in an instruction, the - 5-bit variant of Data-Dependent Fail-First - must be used. i.e. the bit of the CR field to be tested is - the one that has just been modified (created) by the operation. -* When a 3-bit CR Result field is used the 3-bit variant - must be used, providing as it does the missing `CRbit` field - in order to select which CR Field bit of the result shall - be tested (EQ, LE, GE, SO) - -The reason why the 3-bit CR variant needs the additional CR-bit field -should be obvious from the fact that the 3-bit CR Field from the base -Power ISA v3.0B operation clearly does not contain and is missing the -two CR Field Selector bits. Thus, these two bits (to select EQ, LE, -GE or SO) must be provided in another way. - -Examples of the former type: - -* crand, cror, crnor. These all are 5-bit (BA, BB, BT). The bit - to be tested against `inv` is the one selected by `BT` -* mcrf. This has only 3-bit (BF, BFA). In order to select the - bit to be tested, the alternative encoding must be used. - With `CRbit` coming from the SVP64 RM bits 22-23 the bit - of BF to be tested is identified. - -Just as with SVP64 [[sv/branches]] there is the option to truncate -VL to include the element being tested (`VLi=1`) and to exclude it -(`VLi=0`). - -Also exactly as with [[sv/normal]] fail-first, VL cannot, unlike -[[sv/ldst]], be set to an arbitrary value. Deterministic behaviour -is *required*. - -## Reduction and Iteration - -Bearing in mind as described in the svp64 Appendix, SVP64 Horizontal -Reduction is a deterministic schedule on top of base Scalar v3.0 -operations, the same rules apply to CR Operations, i.e. that programmers -must follow certain conventions in order for an *end result* of a -reduction to be achieved. Unlike other Vector ISAs *there are no explicit -reduction opcodes* in SVP64: Schedules however achieve the same effect. - -Due to these conventions only reduction on operations such as `crand` -and `cror` are meaningful because these have Condition Register Fields -as both input and output. Meaningless operations are not prohibited -because the cost in hardware of doing so is prohibitive, but neither -are they `UNDEFINED`. Implementations are still required to execute them -but are at liberty to optimise out any operations that would ultimately -be overwritten, as long as Strict Program Order is still obvservable by -the programmer. - -Also bear in mind that 'Reverse Gear' may be enabled, which can be -used in combination with overlapping CR operations to iteratively -accumulate results. Issuing a `sv.crand` operation for example with -`BA` differing from `BB` by one Condition Register Field would result -in a cascade effect, where the first-encountered CR Field would set the -result to zero, and also all subsequent CR Field elements thereafter: - -``` - # sv.crand/mr/rg CR4.ge.v, CR5.ge.v, CR4.ge.v - for i in VL-1 downto 0 # reverse gear - CR.field[4+i].ge &= CR.field[5+i].ge -``` - -`sv.crxor` with reduction would be particularly useful for parity -calculation for example, although there are many ways in which the same -calculation could be carried out after transferring a vector of CR Fields -to a GPR using crweird operations. - -Implementations are free and clear to optimise these reductions in any way -they see fit, as long as the end-result is compatible with Strict Program -Order being observed, and Interrupt latency is not adversely impacted. - -## Unusual and quirky CR operations - -**cmp and other compare ops** - -`cmp` and `cmpi` etc take GPRs as sources and create a CR Field as a result. - - cmpli BF,L,RA,UI - cmpeqb BF,RA,RB - -With `ELWIDTH` applying to the source GPR operands this is perfectly fine. - -**crweird operations** - -There are 4 weird CR-GPR operations and one reasonable one in -the [[cr_int_predication]] set: - -* crrweird -* mtcrweird -* crweirder -* crweird -* mcrfm - reasonably normal and referring to CR Fields for src and dest. - -The "weird" operations have a non-standard behaviour, being able to -treat *individual bits* of a GPR effectively as elements. They are -expected to be Micro-coded by most Hardware implementations. - [[!tag opf_rfc]] --------- - -\newpage{} - -- 2.30.2