From b770be3738898ca7682d42137cc8c4a804431545 Mon Sep 17 00:00:00 2001 From: lkcl Date: Wed, 29 Mar 2023 14:36:42 +0100 Subject: [PATCH] --- openpower/sv/rfc/ls010.mdwn | 651 +++++++++++++++++++++++++++++++++++- 1 file changed, 649 insertions(+), 2 deletions(-) diff --git a/openpower/sv/rfc/ls010.mdwn b/openpower/sv/rfc/ls010.mdwn index e5aa21a94..e30326f2b 100644 --- a/openpower/sv/rfc/ls010.mdwn +++ b/openpower/sv/rfc/ls010.mdwn @@ -1,3 +1,650 @@ -# RFC ls009 SVP64 Zero-Overhead Loop Subsystem +# RFC ls009 SVP64 Zero-Overhead Loop Prefix Subsystem + +Credits and acknowledgements: + +* Luke Leighton +* Jacob Lifshay +* Hendrik Boom +* Richard Wilbur +* Alexandre Oliva +* Cesar Strauss +* NLnet Foundation, for funding +* OpenPOWER Foundation +* Paul Mackerras +* Toshaan Bharvani +* IBM for the Power ISA itself + +Links: + +* + +# Introduction + +This document focuses on the encoding of [[SV|sv]], and assumes familiarity with the same. It does not cover how SV works (merely the instruction encoding), and is therefore best read in conjunction with the [[sv/overview]], as well as the [[sv/svp64_quirks]] section. +It is also crucial to note that whilst this format augments instruction +behaviour it works in conjunction with SVSTATE and other [[sv/sprs]]. + +All bit numbers are in MSB0 form (the bits are numbered from 0 at the MSB +on the left +and counting up as you move rightwards to the LSB end). All bit ranges are inclusive +(so `4:6` means bits 4, 5, and 6, in MSB0 order). **All register numbering and +element numbering however is LSB0 ordering** which is a different convention used +elsewhere in the Power ISA. + +64-bit instructions are split into two 32-bit words, the prefix and the +suffix. The prefix always comes before the suffix in PC order. + +| 0:5 | 6:31 | 32:63 | +|--------|--------------|--------------| +| EXT01 | v3.1 Prefix | v3.0/1 Suffix | + +svp64 fits into the "reserved" portions of the v3.1 prefix, making it possible for svp64, v3.0B (or v3.1 including 64 bit prefixed) instructions to co-exist in the same binary without conflict. + +Subset implementations in hardware are permitted, as long as certain +rules are followed, allowing for full soft-emulation including future +revisions. Details in the [[svp64/appendix]]. + +## SVP64 encoding features + +A number of features need to be compacted into a very small space of only 24 bits: + +* Independent per-register Scalar/Vector tagging and range extension on every register +* Element width overrides on both source and destination +* Predication on both source and destination +* Two different sources of predication: INT and CR Fields +* SV Modes including saturation (for Audio, Video and DSP), mapreduce, fail-first and + predicate-result mode. + +This document focusses specifically on how that fits into available space. The [[svp64/appendix]] explains more of the details, whilst the [[sv/overview]] gives the basics. + +# Definition of Reserved in this spec. + +For the new fields added in SVP64, instructions that have any of their +fields set to a reserved value must cause an illegal instruction trap, +to allow emulation of future instruction sets, or for subsets of SVP64 +to be implemented in hardware and the rest emulated. +This includes SVP64 SPRs: reading or writing values which are not +supported in hardware must also raise illegal instruction traps +in order to allow emulation. +Unless otherwise stated, reserved values are always all zeros. + +This is unlike OpenPower ISA v3.1, which in many instances does not require a trap if reserved fields are nonzero. Where the standard Power ISA definition +is intended the red keyword `RESERVED` is used. + +# Scalar Identity Behaviour + +SVP64 is designed so that when the prefix is all zeros, and + VL=1, no effect or +influence occurs (no augmentation) such that all standard Power ISA +v3.0/v3 1 instructions covered by the prefix are "unaltered". This is termed `scalar identity behaviour` (based on the mathematical definition for "identity", as in, "identity matrix" or better "identity transformation"). + +Note that this is completely different from when VL=0. VL=0 turns all operations under its influence into `nops` (regardless of the prefix) + whereas when VL=1 and the SV prefix is all zeros, the operation simply acts as if SV had not been applied at all to the instruction (an "identity transformation"). + +# Register Naming and size + +SV Registers are simply the INT, FP and CR register files extended +linearly to larger sizes; SV Vectorisation iterates sequentially through these registers. + +Where the integer regfile in standard scalar +Power ISA v3.0B/v3.1B is r0 to r31, SV extends this as r0 to r127. +Likewise FP registers are extended to 128 (fp0 to fp127), and CR Fields +are +extended to 128 entries, CR0 thru CR127. + +The names of the registers therefore reflects a simple linear extension +of the Power ISA v3.0B / v3.1B register naming, and in hardware this +would be reflected by a linear increase in the size of the underlying +SRAM used for the regfiles. + +Note: when an EXTRA field (defined below) is zero, SV is deliberately designed +so that the register fields are identical to as if SV was not in effect +i.e. under these circumstances (EXTRA=0) the register field names RA, +RB etc. are interpreted and treated as v3.0B / v3.1B scalar registers. This is part of +`scalar identity behaviour` described above. + +## Future expansion. + +With the way that EXTRA fields are defined and applied to register fields, +future versions of SV may involve 256 or greater registers. Backwards binary compatibility may be achieved with a PCR bit (Program Compatibility Register). Further discussion is out of scope for this version of SVP64. + +# Remapped Encoding (`RM[0:23]`) + +To allow relatively easy remapping of which portions of the Prefix Opcode +Map are used for SVP64 without needing to rewrite a large portion of the +SVP64 spec, a mapping is defined from the OpenPower v3.1 prefix bits to +a new 24-bit Remapped Encoding denoted `RM[0]` at the MSB to `RM[23]` +at the LSB. + +The mapping from the OpenPower v3.1 prefix bits to the Remapped Encoding +is defined in the Prefix Fields section. + +## Prefix Opcode Map (64-bit instruction encoding) + +In the original table in the v3.1B Power ISA Spec on p1350, Table 12, prefix bits 6:11 are shown, with their allocations to different v3.1B pregix "modes". + +The table below hows both PowerISA v3.1 instructions as well as new SVP instructions fit; +empty spaces are yet-to-be-allocated Illegal Instructions. + +| 6:11 | ---000 | ---001 | ---010 | ---011 | ---100 | ---101 | ---110 | ---111 | +|------|--------|--------|--------|--------|--------|--------|--------|--------| +|000---| 8LS | 8LS | 8LS | 8LS | 8LS | 8LS | 8LS | 8LS | +|001---| | | | | | | | | +|010---| 8RR | | | | `SVP64`| `SVP64`| `SVP64`| `SVP64`| +|011---| | | | | `SVP64`| `SVP64`| `SVP64`| `SVP64`| +|100---| MLS | MLS | MLS | MLS | MLS | MLS | MLS | MLS | +|101---| | | | | | | | | +|110---| MRR | | | | `SVP64`| `SVP64`| `SVP64`| `SVP64`| +|111---| | MMIRR | | | `SVP64`| `SVP64`| `SVP64`| `SVP64`| + +Note that by taking up a block of 16, where in every case bits 7 and 9 are set, this allows svp64 to utilise four bits of the v3.1B Prefix space and "allocate" them to svp64's Remapped Encoding field, instead. + +## Prefix Fields + +To "activate" svp64 (in a way that does not conflict with v3.1B 64 bit Prefix mode), fields within the v3.1B Prefix Opcode Map are set +(see Prefix Opcode Map, above), leaving 24 bits "free" for use by SV. +This is achieved by setting bits 7 and 9 to 1: + +| Name | Bits | Value | Description | +|------------|---------|-------|--------------------------------| +| EXT01 | `0:5` | `1` | Indicates Prefixed 64-bit | +| `RM[0]` | `6` | | Bit 0 of Remapped Encoding | +| SVP64_7 | `7` | `1` | Indicates this is SVP64 | +| `RM[1]` | `8` | | Bit 1 of Remapped Encoding | +| SVP64_9 | `9` | `1` | Indicates this is SVP64 | +| `RM[2:23]` | `10:31` | | Bits 2-23 of Remapped Encoding | + +Laid out bitwise, this is as follows, showing how the 32-bits of the prefix +are constructed: + +| 0:5 | 6 | 7 | 8 | 9 | 10:31 | +|--------|-------|---|-------|---|----------| +| EXT01 | RM | 1 | RM | 1 | RM | +| 000001 | RM[0] | 1 | RM[1] | 1 | RM[2:23] | + +Following the prefix will be the suffix: this is simply a 32-bit v3.0B / v3.1 +instruction. That instruction becomes "prefixed" with the SVP context: the +Remapped Encoding field (RM). + +It is important to note that unlike v3.1 64-bit prefixed instructions +there is insufficient space in `RM` to provide identification of +any SVP64 Fields without first partially decoding the +32-bit suffix. Similar to the "Forms" (X-Form, D-Form) the +`RM` format is individually associated with every instruction. + +Extreme caution and care must therefore be taken +when extending SVP64 in future, to not create unnecessary relationships +between prefix and suffix that could complicate decoding, adding latency. + +# Common RM fields + +The following fields are common to all Remapped Encodings: + +| Field Name | Field bits | Description | +|------------|------------|----------------------------------------| +| MASKMODE | `0` | Execution (predication) Mask Kind | +| MASK | `1:3` | Execution Mask | +| SUBVL | `8:9` | Sub-vector length | + +The following fields are optional or encoded differently depending +on context after decoding of the Scalar suffix: + +| Field Name | Field bits | Description | +|------------|------------|----------------------------------------| +| ELWIDTH | `4:5` | Element Width | +| ELWIDTH_SRC | `6:7` | Element Width for Source | +| EXTRA | `10:18` | Register Extra encoding | +| MODE | `19:23` | changes Vector behaviour | + +* MODE changes the behaviour of the SV operation (result saturation, mapreduce) +* SUBVL groups elements together into vec2, vec3, vec4 for use in 3D and Audio/Video DSP work +* ELWIDTH and ELWIDTH_SRC overrides the instruction's destination and source operand width +* MASK (and MASK_SRC) and MASKMODE provide predication (two types of sources: scalar INT and Vector CR). +* Bits 10 to 18 (EXTRA) are further decoded depending on the RM category for the instruction, which is determined only by decoding the Scalar 32 bit suffix. + +Similar to Power ISA `X-Form` etc. EXTRA bits are given designations, such as `RM-1P-3S1D` which indicates for this example that the operation is to be single-predicated and that there are 3 source operand EXTRA tags and one destination operand tag. + +Note that if ELWIDTH != ELWIDTH_SRC this may result in reduced performance or increased latency in some implementations due to lane-crossing. + +# Mode + +Mode is an augmentation of SV behaviour. Different types of +instructions have different needs, similar to Power ISA +v3.1 64 bit prefix 8LS and MTRR formats apply to different +instruction types. Modes include Reduction, Iteration, arithmetic +saturation, and Fail-First. More specific details in each +section and in the [[svp64/appendix]] + +* For condition register operations see [[sv/cr_ops]] +* For LD/ST Modes, see [[sv/ldst]]. +* For Branch modes, see [[sv/branches]] +* For arithmetic and logical, see [[sv/normal]] + +# ELWIDTH Encoding + +Default behaviour is set to 0b00 so that zeros follow the convention of +`scalar identity behaviour`. In this case it means that elwidth overrides +are not applicable. Thus if a 32 bit instruction operates on 32 bit, +`elwidth=0b00` specifies that this behaviour is unmodified. Likewise +when a processor is switched from 64 bit to 32 bit mode, `elwidth=0b00` +states that, again, the behaviour is not to be modified. + +Only when elwidth is nonzero is the element width overridden to the +explicitly required value. + +## Elwidth for Integers: + +| Value | Mnemonic | Description | +|-------|----------------|------------------------------------| +| 00 | DEFAULT | default behaviour for operation | +| 01 | `ELWIDTH=w` | Word: 32-bit integer | +| 10 | `ELWIDTH=h` | Halfword: 16-bit integer | +| 11 | `ELWIDTH=b` | Byte: 8-bit integer | + +This encoding is chosen such that the byte width may be computed as +`8<<(3-ew)` + +## Elwidth for FP Registers: + +| Value | Mnemonic | Description | +|-------|----------------|------------------------------------| +| 00 | DEFAULT | default behaviour for FP operation | +| 01 | `ELWIDTH=f32` | 32-bit IEEE 754 Single floating-point | +| 10 | `ELWIDTH=f16` | 16-bit IEEE 754 Half floating-point | +| 11 | `ELWIDTH=bf16` | Reserved for `bf16` | + +Note: +[`bf16`](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format) +is reserved for a future implementation of SV + +Note that any IEEE754 FP operation in Power ISA ending in "s" (`fadds`) shall +perform its operation at **half** the ELWIDTH then padded back out +to ELWIDTH. `sv.fadds/ew=f32` shall perform an IEEE754 FP16 operation that is then "padded" to fill out to an IEEE754 FP32. When ELWIDTH=DEFAULT +clearly the behaviour of `sv.fadds` is performed at 32-bit accuracy +then padded back out to fit in IEEE754 FP64, exactly as for Scalar +v3.0B "single" FP. Any FP operation ending in "s" where ELWIDTH=f16 +or ELWIDTH=bf16 is reserved and must raise an illegal instruction +(IEEE754 FP8 or BF8 are not defined). + +## Elwidth for CRs: + +Element-width overrides for CR Fields has no meaning. The bits +are therefore used for other purposes, or when Rc=1, the Elwidth +applies to the result being tested (a GPR or FPR), but not to the +Vector of CR Fields. + +# SUBVL Encoding + +the default for SUBVL is 1 and its encoding is 0b00 to indicate that +SUBVL is effectively disabled (a SUBVL for-loop of only one element). this +lines up in combination with all other "default is all zeros" behaviour. + +| Value | Mnemonic | Subvec | Description | +|-------|-----------|---------|------------------------| +| 00 | `SUBVL=1` | single | Sub-vector length of 1 | +| 01 | `SUBVL=2` | vec2 | Sub-vector length of 2 | +| 10 | `SUBVL=3` | vec3 | Sub-vector length of 3 | +| 11 | `SUBVL=4` | vec4 | Sub-vector length of 4 | + +The SUBVL encoding value may be thought of as an inclusive range of a +sub-vector. SUBVL=2 represents a vec2, its encoding is 0b01, therefore +this may be considered to be elements 0b00 to 0b01 inclusive. + +# MASK/MASK_SRC & MASKMODE Encoding + +TODO: rename MASK_KIND to MASKMODE + +One bit (`MASKMODE`) indicates the mode: CR or Int predication. The two +types may not be mixed. + +Special note: to disable predication this field must +be set to zero in combination with Integer Predication also being set +to 0b000. this has the effect of enabling "all 1s" in the predicate +mask, which is equivalent to "not having any predication at all" +and consequently, in combination with all other default zeros, fully +disables SV (`scalar identity behaviour`). + +`MASKMODE` may be set to one of 2 values: + +| Value | Description | +|-----------|------------------------------------------------------| +| 0 | MASK/MASK_SRC are encoded using Integer Predication | +| 1 | MASK/MASK_SRC are encoded using CR-based Predication | + +Integer Twin predication has a second set of 3 bits that uses the same +encoding thus allowing either the same register (r3, r10 or r31) to be used +for both src and dest, or different regs (one for src, one for dest). + +Likewise CR based twin predication has a second set of 3 bits, allowing +a different test to be applied. + +Note that it is assumed that Predicate Masks (whether INT or CR) +are read *before* the operations proceed. In practice (for CR Fields) +this creates an unnecessary block on parallelism. Therefore, +it is up to the programmer to ensure that the CR fields used as +Predicate Masks are not being written to by any parallel Vector Loop. +Doing so results in **UNDEFINED** behaviour, according to the definition +outlined in the Power ISA v3.0B Specification. + +Hardware Implementations are therefore free and clear to delay reading +of individual CR fields until the actual predicated element operation +needs to take place, safe in the knowledge that no programmer will +have issued a Vector Instruction where previous elements could have +overwritten (destroyed) not-yet-executed CR-Predicated element operations. + +## Integer Predication (MASKMODE=0) + +When the predicate mode bit is zero the 3 bits are interpreted as below. +Twin predication has an identical 3 bit field similarly encoded. + +`MASK` and `MASK_SRC` may be set to one of 8 values, to provide the following meaning: + +| Value | Mnemonic | Element `i` enabled if: | +|-------|----------|------------------------------| +| 000 | ALWAYS | predicate effectively all 1s | +| 001 | 1 << R3 | `i == R3` | +| 010 | R3 | `R3 & (1 << i)` is non-zero | +| 011 | ~R3 | `R3 & (1 << i)` is zero | +| 100 | R10 | `R10 & (1 << i)` is non-zero | +| 101 | ~R10 | `R10 & (1 << i)` is zero | +| 110 | R30 | `R30 & (1 << i)` is non-zero | +| 111 | ~R30 | `R30 & (1 << i)` is zero | + +r10 and r30 are at the high end of temporary and unused registers, so as not to interfere with register allocation from ABIs. + +## CR-based Predication (MASKMODE=1) + +When the predicate mode bit is one the 3 bits are interpreted as below. +Twin predication has an identical 3 bit field similarly encoded. + +`MASK` and `MASK_SRC` may be set to one of 8 values, to provide the following meaning: + +| Value | Mnemonic | Element `i` is enabled if | +|-------|----------|--------------------------| +| 000 | lt | `CR[offs+i].LT` is set | +| 001 | nl/ge | `CR[offs+i].LT` is clear | +| 010 | gt | `CR[offs+i].GT` is set | +| 011 | ng/le | `CR[offs+i].GT` is clear | +| 100 | eq | `CR[offs+i].EQ` is set | +| 101 | ne | `CR[offs+i].EQ` is clear | +| 110 | so/un | `CR[offs+i].FU` is set | +| 111 | ns/nu | `CR[offs+i].FU` is clear | + +CR based predication. TODO: select alternate CR for twin predication? see +[[discussion]] Overlap of the two CR based predicates must be taken +into account, so the starting point for one of them must be suitably +high, or accept that for twin predication VL must not exceed the range +where overlap will occur, *or* that they use the same starting point +but select different *bits* of the same CRs + +`offs` is defined as CR32 (4x8) so as to mesh cleanly with Vectorised Rc=1 operations (see below). Rc=1 operations start from CR8 (TBD). + +The CR Predicates chosen must start on a boundary that Vectorised +CR operations can access cleanly, in full. +With EXTRA2 restricting starting points +to multiples of 8 (CR0, CR8, CR16...) both Vectorised Rc=1 and CR Predicate +Masks have to be adapted to fit on these boundaries as well. + +# Extra Remapped Encoding + +Shows all instruction-specific fields in the Remapped Encoding `RM[10:18]` for all instruction variants. Note that due to the very tight space, the encoding mode is *not* included in the prefix itself. The mode is "applied", similar to Power ISA "Forms" (X-Form, D-Form) on a per-instruction basis, and, like "Forms" are given a designation (below) of the form `RM-nP-nSnD`. The full list of which instructions use which remaps is here [[opcode_regs_deduped]]. (*Machine-readable CSV files have been provided which will make the task of creating SV-aware ISA decoders easier*). + +These mappings are part of the SVP64 Specification in exactly the same +way as X-Form, D-Form. New Scalar instructions added to the Power ISA +will need a corresponding SVP64 Mapping, which can be derived by-rote +from examining the Register "Profile" of the instruction. + +There are two categories: Single and Twin Predication. +Due to space considerations further subdivision of Single Predication +is based on whether the number of src operands is 2 or 3. With only +9 bits available some compromises have to be made. + +* `RM-1P-3S1D` Single Predication dest/src1/2/3, applies to 4-operand instructions (fmadd, isel, madd). +* `RM-1P-2S1D` Single Predication dest/src1/2 applies to 3-operand instructions (src1 src2 dest) +* `RM-2P-1S1D` Twin Predication (src=1, dest=1) +* `RM-2P-2S1D` Twin Predication (src=2, dest=1) primarily for LDST (Indexed) +* `RM-2P-1S2D` Twin Predication (src=1, dest=2) primarily for LDST Update + +## RM-1P-3S1D + +| Field Name | Field bits | Description | +|------------|------------|----------------------------------------| +| Rdest\_EXTRA2 | `10:11` | extends Rdest (R\*\_EXTRA2 Encoding) | +| Rsrc1\_EXTRA2 | `12:13` | extends Rsrc1 (R\*\_EXTRA2 Encoding) | +| Rsrc2\_EXTRA2 | `14:15` | extends Rsrc2 (R\*\_EXTRA2 Encoding) | +| Rsrc3\_EXTRA2 | `16:17` | extends Rsrc3 (R\*\_EXTRA2 Encoding) | +| EXTRA2_MODE | `18` | used by `divmod2du` and `maddedu` for RS | + +These are for 3 operand in and either 1 or 2 out instructions. +3-in 1-out includes `madd RT,RA,RB,RC`. (DRAFT) instructions +such as `maddedu` have an implicit second destination, RS, the +selection of which is determined by bit 18. + +## RM-1P-2S1D + +| Field Name | Field bits | Description | +|------------|------------|-------------------------------------------| +| Rdest\_EXTRA3 | `10:12` | extends Rdest | +| Rsrc1\_EXTRA3 | `13:15` | extends Rsrc1 | +| Rsrc2\_EXTRA3 | `16:18` | extends Rsrc3 | + +These are for 2 operand 1 dest instructions, such as `add RT, RA, +RB`. However also included are unusual instructions with an implicit dest +that is identical to its src reg, such as `rlwinmi`. + +Normally, with instructions such as `rlwinmi`, the scalar v3.0B ISA would not have sufficient bit fields to allow +an alternative destination. With SV however this becomes possible. +Therefore, the fact that the dest is implicitly also a src should not +mislead: due to the *prefix* they are different SV regs. + +* `rlwimi RA, RS, ...` +* Rsrc1_EXTRA3 applies to RS as the first src +* Rsrc2_EXTRA3 applies to RA as the secomd src +* Rdest_EXTRA3 applies to RA to create an **independent** dest. + +With the addition of the EXTRA bits, the three registers +each may be *independently* made vector or scalar, and be independently +augmented to 7 bits in length. + +## RM-2P-1S1D/2S + +| Field Name | Field bits | Description | +|------------|------------|----------------------------| +| Rdest_EXTRA3 | `10:12` | extends Rdest | +| Rsrc1_EXTRA3 | `13:15` | extends Rsrc1 | +| MASK_SRC | `16:18` | Execution Mask for Source | + +`RM-2P-2S` is for `stw` etc. and is Rsrc1 Rsrc2. + +## RM-1P-2S1D + +single-predicate, three registers (2 read, 1 write) + +| Field Name | Field bits | Description | +|------------|------------|----------------------------| +| Rdest_EXTRA3 | `10:12` | extends Rdest | +| Rsrc1_EXTRA3 | `13:15` | extends Rsrc1 | +| Rsrc2_EXTRA3 | `16:18` | extends Rsrc2 | + +## RM-2P-2S1D/1S2D/3S + +The primary purpose for this encoding is for Twin Predication on LOAD +and STORE operations. see [[sv/ldst]] for detailed anslysis. + +RM-2P-2S1D: + +| Field Name | Field bits | Description | +|------------|------------|----------------------------| +| Rdest_EXTRA2 | `10:11` | extends Rdest (R\*\_EXTRA2 Encoding) | +| Rsrc1_EXTRA2 | `12:13` | extends Rsrc1 (R\*\_EXTRA2 Encoding) | +| Rsrc2_EXTRA2 | `14:15` | extends Rsrc2 (R\*\_EXTRA2 Encoding) | +| MASK_SRC | `16:18` | Execution Mask for Source | + +Note that for 1S2P the EXTRA2 dest and src names are switched (Rsrc_EXTRA2 +is in bits 10:11, Rdest1_EXTRA2 in 12:13) + +Also that for 3S (to cover `stdx` etc.) the names are switched to 3 src: Rsrc1_EXTRA2, Rsrc2_EXTRA2, Rsrc3_EXTRA2. + +Note also that LD with update indexed, which takes 2 src and 2 dest +(e.g. `lhaux RT,RA,RB`), does not have room for 4 registers and also +Twin Predication. therefore these are treated as RM-2P-2S1D and the +src spec for RA is also used for the same RA as a dest. + +Note that if ELWIDTH != ELWIDTH_SRC this may result in reduced performance or increased latency in some implementations due to lane-crossing. + +# R\*\_EXTRA2/3 + +EXTRA is the means by which two things are achieved: + +1. Registers are marked as either Vector *or Scalar* +2. Register field numbers (limited typically to 5 bit) + are extended in range, both for Scalar and Vector. + +The register files are therefore extended: + +* INT is extended from r0-31 to r0-127 +* FP is extended from fp0-32 to fp0-fp127 +* CR Fields are extended from CR0-7 to CR0-127 + +However due to pressure in `RM.EXTRA` not all these registers +are accessible by all instructions, particularly those with +a large number of operands (`madd`, `isel`). + +In the following tables register numbers are constructed from the +standard v3.0B / v3.1B 32 bit register field (RA, FRA) and the EXTRA2 +or EXTRA3 field from the SV Prefix, determined by the specific +RM-xx-yyyy designation for a given instruction. +The prefixing is arranged so that +interoperability between prefixing and nonprefixing of scalar registers +is direct and convenient (when the EXTRA field is all zeros). + +A pseudocode algorithm explains the relationship, for INT/FP (see [[svp64/appendix]] for CRs) + +``` + if extra3_mode: + spec = EXTRA3 + else: + spec = EXTRA2 << 1 # same as EXTRA3, shifted + if spec[0]: # vector + return (RA << 2) | spec[1:2] + else: # scalar + return (spec[1:2] << 5) | RA +``` + +Future versions may extend to 256 by shifting Vector numbering up. +Scalar will not be altered. + +Note that in some cases the range of starting points for Vectors +is limited. + +## INT/FP EXTRA3 + +If EXTRA3 is zero, maps to +"scalar identity" (scalar Power ISA field naming). + +Fields are as follows: + +* Value: R_EXTRA3 +* Mode: register is tagged as scalar or vector +* Range/Inc: the range of registers accessible from this EXTRA + encoding, and the "increment" (accessibility). "/4" means + that this EXTRA encoding may only give access (starting point) + every 4th register. +* MSB..LSB: the bit field showing how the register opcode field + combines with EXTRA to give (extend) the register number (GPR) + +| Value | Mode | Range/Inc | 6..0 | +|-----------|-------|---------------|---------------------| +| 000 | Scalar | `r0-r31`/1 | `0b00 RA` | +| 001 | Scalar | `r32-r63`/1 | `0b01 RA` | +| 010 | Scalar | `r64-r95`/1 | `0b10 RA` | +| 011 | Scalar | `r96-r127`/1 | `0b11 RA` | +| 100 | Vector | `r0-r124`/4 | `RA 0b00` | +| 101 | Vector | `r1-r125`/4 | `RA 0b01` | +| 110 | Vector | `r2-r126`/4 | `RA 0b10` | +| 111 | Vector | `r3-r127`/4 | `RA 0b11` | + +## INT/FP EXTRA2 + +If EXTRA2 is zero will map to +"scalar identity behaviour" i.e Scalar Power ISA register naming: + +| Value | Mode | Range/inc | 6..0 | +|-----------|-------|---------------|-----------| +| 00 | Scalar | `r0-r31`/1 | `0b00 RA` | +| 01 | Scalar | `r32-r63`/1 | `0b01 RA` | +| 10 | Vector | `r0-r124`/4 | `RA 0b00` | +| 11 | Vector | `r2-r126`/4 | `RA 0b10` | + +**Note that unlike in EXTRA3, in EXTRA2**: + +* the GPR Vectors may only start from + `r0, r2, r4, r6, r8` and likewise FPR Vectors. +* the GPR Scalars may only go from `r0, r1, r2.. r63` and likewise FPR Scalars. + +as there is insufficient bits to cover the full range. + +## CR Field EXTRA3 + +CR Field encoding is essentially the same but made more complex due to CRs being bit-based. See [[svp64/appendix]] for explanation and pseudocode. +Note that Vectors may only start from `CR0, CR4, CR8, CR12, CR16, CR20`... +and Scalars may only go from `CR0, CR1, ... CR31` + +Encoding shown MSB down to LSB + +For a 5-bit operand (BA, BB, BT): + +| Value | Mode | Range/Inc | 8..5 | 4..2 | 1..0 | +|-------|------|---------------|-----------| --------|---------| +| 000 | Scalar | `CR0-CR7`/1 | 0b0000 | BA[4:2] | BA[1:0] | +| 001 | Scalar | `CR8-CR15`/1 | 0b0001 | BA[4:2] | BA[1:0] | +| 010 | Scalar | `CR16-CR23`/1 | 0b0010 | BA[4:2] | BA[1:0] | +| 011 | Scalar | `CR24-CR31`/1 | 0b0011 | BA[4:2] | BA[1:0] | +| 100 | Vector | `CR0-CR112`/16 | BA[4:2] 0 | 0b000 | BA[1:0] | +| 101 | Vector | `CR4-CR116`/16 | BA[4:2] 0 | 0b100 | BA[1:0] | +| 110 | Vector | `CR8-CR120`/16 | BA[4:2] 1 | 0b000 | BA[1:0] | +| 111 | Vector | `CR12-CR124`/16 | BA[4:2] 1 | 0b100 | BA[1:0] | + +For a 3-bit operand (e.g. BFA): + +| Value | Mode | Range/Inc | 6..3 | 2..0 | +|-------|------|---------------|-----------| --------| +| 000 | Scalar | `CR0-CR7`/1 | 0b0000 | BFA | +| 001 | Scalar | `CR8-CR15`/1 | 0b0001 | BFA | +| 010 | Scalar | `CR16-CR23`/1 | 0b0010 | BFA | +| 011 | Scalar | `CR24-CR31`/1 | 0b0011 | BFA | +| 100 | Vector | `CR0-CR112`/16 | BFA 0 | 0b000 | +| 101 | Vector | `CR4-CR116`/16 | BFA 0 | 0b100 | +| 110 | Vector | `CR8-CR120`/16 | BFA 1 | 0b000 | +| 111 | Vector | `CR12-CR124`/16 | BFA 1 | 0b100 | + +## CR EXTRA2 + +CR encoding is essentially the same but made more complex due to CRs being bit-based. See separate section for explanation and pseudocode. +Note that Vectors may only start from CR0, CR8, CR16, CR24, CR32... + + +Encoding shown MSB down to LSB + +For a 5-bit operand (BA, BB, BC): + +| Value | Mode | Range/Inc | 8..5 | 4..2 | 1..0 | +|-------|--------|----------------|---------|---------|---------| +| 00 | Scalar | `CR0-CR7`/1 | 0b0000 | BA[4:2] | BA[1:0] | +| 01 | Scalar | `CR8-CR15`/1 | 0b0001 | BA[4:2] | BA[1:0] | +| 10 | Vector | `CR0-CR112`/16 | BA[4:2] 0 | 0b000 | BA[1:0] | +| 11 | Vector | `CR8-CR120`/16 | BA[4:2] 1 | 0b000 | BA[1:0] | + +For a 3-bit operand (e.g. BFA): + +| Value | Mode | Range/Inc | 6..3 | 2..0 | +|-------|------|---------------|-----------| --------| +| 00 | Scalar | `CR0-CR7`/1 | 0b0000 | BFA | +| 01 | Scalar | `CR8-CR15`/1 | 0b0001 | BFA | +| 10 | Vector | `CR0-CR112`/16 | BFA 0 | 0b000 | +| 11 | Vector | `CR8-CR120`/16 | BFA 1 | 0b000 | + +# Appendix + +Now at its own page: [[svp64/appendix]] -TODO -- 2.30.2