-# RFC ls009 SVP64 Zero-Overhead Loop Subsystem
+# RFC ls009 SVP64 Zero-Overhead Loop Prefix Subsystem
+
+Credits and acknowledgements:
+
+* Luke Leighton
+* Jacob Lifshay
+* Hendrik Boom
+* Richard Wilbur
+* Alexandre Oliva
+* Cesar Strauss
+* NLnet Foundation, for funding
+* OpenPOWER Foundation
+* Paul Mackerras
+* Toshaan Bharvani
+* IBM for the Power ISA itself
+
+Links:
+
+* <https://bugs.libre-soc.org/show_bug.cgi?id=1045>
+
+# Introduction
+
+This document focuses on the encoding of [[SV|sv]], and assumes familiarity with the same. It does not cover how SV works (merely the instruction encoding), and is therefore best read in conjunction with the [[sv/overview]], as well as the [[sv/svp64_quirks]] section.
+It is also crucial to note that whilst this format augments instruction
+behaviour it works in conjunction with SVSTATE and other [[sv/sprs]].
+
+All bit numbers are in MSB0 form (the bits are numbered from 0 at the MSB
+on the left
+and counting up as you move rightwards to the LSB end). All bit ranges are inclusive
+(so `4:6` means bits 4, 5, and 6, in MSB0 order). **All register numbering and
+element numbering however is LSB0 ordering** which is a different convention used
+elsewhere in the Power ISA.
+
+64-bit instructions are split into two 32-bit words, the prefix and the
+suffix. The prefix always comes before the suffix in PC order.
+
+| 0:5 | 6:31 | 32:63 |
+|--------|--------------|--------------|
+| EXT01 | v3.1 Prefix | v3.0/1 Suffix |
+
+svp64 fits into the "reserved" portions of the v3.1 prefix, making it possible for svp64, v3.0B (or v3.1 including 64 bit prefixed) instructions to co-exist in the same binary without conflict.
+
+Subset implementations in hardware are permitted, as long as certain
+rules are followed, allowing for full soft-emulation including future
+revisions. Details in the [[svp64/appendix]].
+
+## SVP64 encoding features
+
+A number of features need to be compacted into a very small space of only 24 bits:
+
+* Independent per-register Scalar/Vector tagging and range extension on every register
+* Element width overrides on both source and destination
+* Predication on both source and destination
+* Two different sources of predication: INT and CR Fields
+* SV Modes including saturation (for Audio, Video and DSP), mapreduce, fail-first and
+ predicate-result mode.
+
+This document focusses specifically on how that fits into available space. The [[svp64/appendix]] explains more of the details, whilst the [[sv/overview]] gives the basics.
+
+# Definition of Reserved in this spec.
+
+For the new fields added in SVP64, instructions that have any of their
+fields set to a reserved value must cause an illegal instruction trap,
+to allow emulation of future instruction sets, or for subsets of SVP64
+to be implemented in hardware and the rest emulated.
+This includes SVP64 SPRs: reading or writing values which are not
+supported in hardware must also raise illegal instruction traps
+in order to allow emulation.
+Unless otherwise stated, reserved values are always all zeros.
+
+This is unlike OpenPower ISA v3.1, which in many instances does not require a trap if reserved fields are nonzero. Where the standard Power ISA definition
+is intended the red keyword `RESERVED` is used.
+
+# Scalar Identity Behaviour
+
+SVP64 is designed so that when the prefix is all zeros, and
+ VL=1, no effect or
+influence occurs (no augmentation) such that all standard Power ISA
+v3.0/v3 1 instructions covered by the prefix are "unaltered". This is termed `scalar identity behaviour` (based on the mathematical definition for "identity", as in, "identity matrix" or better "identity transformation").
+
+Note that this is completely different from when VL=0. VL=0 turns all operations under its influence into `nops` (regardless of the prefix)
+ whereas when VL=1 and the SV prefix is all zeros, the operation simply acts as if SV had not been applied at all to the instruction (an "identity transformation").
+
+# Register Naming and size
+
+SV Registers are simply the INT, FP and CR register files extended
+linearly to larger sizes; SV Vectorisation iterates sequentially through these registers.
+
+Where the integer regfile in standard scalar
+Power ISA v3.0B/v3.1B is r0 to r31, SV extends this as r0 to r127.
+Likewise FP registers are extended to 128 (fp0 to fp127), and CR Fields
+are
+extended to 128 entries, CR0 thru CR127.
+
+The names of the registers therefore reflects a simple linear extension
+of the Power ISA v3.0B / v3.1B register naming, and in hardware this
+would be reflected by a linear increase in the size of the underlying
+SRAM used for the regfiles.
+
+Note: when an EXTRA field (defined below) is zero, SV is deliberately designed
+so that the register fields are identical to as if SV was not in effect
+i.e. under these circumstances (EXTRA=0) the register field names RA,
+RB etc. are interpreted and treated as v3.0B / v3.1B scalar registers. This is part of
+`scalar identity behaviour` described above.
+
+## Future expansion.
+
+With the way that EXTRA fields are defined and applied to register fields,
+future versions of SV may involve 256 or greater registers. Backwards binary compatibility may be achieved with a PCR bit (Program Compatibility Register). Further discussion is out of scope for this version of SVP64.
+
+# Remapped Encoding (`RM[0:23]`)
+
+To allow relatively easy remapping of which portions of the Prefix Opcode
+Map are used for SVP64 without needing to rewrite a large portion of the
+SVP64 spec, a mapping is defined from the OpenPower v3.1 prefix bits to
+a new 24-bit Remapped Encoding denoted `RM[0]` at the MSB to `RM[23]`
+at the LSB.
+
+The mapping from the OpenPower v3.1 prefix bits to the Remapped Encoding
+is defined in the Prefix Fields section.
+
+## Prefix Opcode Map (64-bit instruction encoding)
+
+In the original table in the v3.1B Power ISA Spec on p1350, Table 12, prefix bits 6:11 are shown, with their allocations to different v3.1B pregix "modes".
+
+The table below hows both PowerISA v3.1 instructions as well as new SVP instructions fit;
+empty spaces are yet-to-be-allocated Illegal Instructions.
+
+| 6:11 | ---000 | ---001 | ---010 | ---011 | ---100 | ---101 | ---110 | ---111 |
+|------|--------|--------|--------|--------|--------|--------|--------|--------|
+|000---| 8LS | 8LS | 8LS | 8LS | 8LS | 8LS | 8LS | 8LS |
+|001---| | | | | | | | |
+|010---| 8RR | | | | `SVP64`| `SVP64`| `SVP64`| `SVP64`|
+|011---| | | | | `SVP64`| `SVP64`| `SVP64`| `SVP64`|
+|100---| MLS | MLS | MLS | MLS | MLS | MLS | MLS | MLS |
+|101---| | | | | | | | |
+|110---| MRR | | | | `SVP64`| `SVP64`| `SVP64`| `SVP64`|
+|111---| | MMIRR | | | `SVP64`| `SVP64`| `SVP64`| `SVP64`|
+
+Note that by taking up a block of 16, where in every case bits 7 and 9 are set, this allows svp64 to utilise four bits of the v3.1B Prefix space and "allocate" them to svp64's Remapped Encoding field, instead.
+
+## Prefix Fields
+
+To "activate" svp64 (in a way that does not conflict with v3.1B 64 bit Prefix mode), fields within the v3.1B Prefix Opcode Map are set
+(see Prefix Opcode Map, above), leaving 24 bits "free" for use by SV.
+This is achieved by setting bits 7 and 9 to 1:
+
+| Name | Bits | Value | Description |
+|------------|---------|-------|--------------------------------|
+| EXT01 | `0:5` | `1` | Indicates Prefixed 64-bit |
+| `RM[0]` | `6` | | Bit 0 of Remapped Encoding |
+| SVP64_7 | `7` | `1` | Indicates this is SVP64 |
+| `RM[1]` | `8` | | Bit 1 of Remapped Encoding |
+| SVP64_9 | `9` | `1` | Indicates this is SVP64 |
+| `RM[2:23]` | `10:31` | | Bits 2-23 of Remapped Encoding |
+
+Laid out bitwise, this is as follows, showing how the 32-bits of the prefix
+are constructed:
+
+| 0:5 | 6 | 7 | 8 | 9 | 10:31 |
+|--------|-------|---|-------|---|----------|
+| EXT01 | RM | 1 | RM | 1 | RM |
+| 000001 | RM[0] | 1 | RM[1] | 1 | RM[2:23] |
+
+Following the prefix will be the suffix: this is simply a 32-bit v3.0B / v3.1
+instruction. That instruction becomes "prefixed" with the SVP context: the
+Remapped Encoding field (RM).
+
+It is important to note that unlike v3.1 64-bit prefixed instructions
+there is insufficient space in `RM` to provide identification of
+any SVP64 Fields without first partially decoding the
+32-bit suffix. Similar to the "Forms" (X-Form, D-Form) the
+`RM` format is individually associated with every instruction.
+
+Extreme caution and care must therefore be taken
+when extending SVP64 in future, to not create unnecessary relationships
+between prefix and suffix that could complicate decoding, adding latency.
+
+# Common RM fields
+
+The following fields are common to all Remapped Encodings:
+
+| Field Name | Field bits | Description |
+|------------|------------|----------------------------------------|
+| MASKMODE | `0` | Execution (predication) Mask Kind |
+| MASK | `1:3` | Execution Mask |
+| SUBVL | `8:9` | Sub-vector length |
+
+The following fields are optional or encoded differently depending
+on context after decoding of the Scalar suffix:
+
+| Field Name | Field bits | Description |
+|------------|------------|----------------------------------------|
+| ELWIDTH | `4:5` | Element Width |
+| ELWIDTH_SRC | `6:7` | Element Width for Source |
+| EXTRA | `10:18` | Register Extra encoding |
+| MODE | `19:23` | changes Vector behaviour |
+
+* MODE changes the behaviour of the SV operation (result saturation, mapreduce)
+* SUBVL groups elements together into vec2, vec3, vec4 for use in 3D and Audio/Video DSP work
+* ELWIDTH and ELWIDTH_SRC overrides the instruction's destination and source operand width
+* MASK (and MASK_SRC) and MASKMODE provide predication (two types of sources: scalar INT and Vector CR).
+* Bits 10 to 18 (EXTRA) are further decoded depending on the RM category for the instruction, which is determined only by decoding the Scalar 32 bit suffix.
+
+Similar to Power ISA `X-Form` etc. EXTRA bits are given designations, such as `RM-1P-3S1D` which indicates for this example that the operation is to be single-predicated and that there are 3 source operand EXTRA tags and one destination operand tag.
+
+Note that if ELWIDTH != ELWIDTH_SRC this may result in reduced performance or increased latency in some implementations due to lane-crossing.
+
+# Mode
+
+Mode is an augmentation of SV behaviour. Different types of
+instructions have different needs, similar to Power ISA
+v3.1 64 bit prefix 8LS and MTRR formats apply to different
+instruction types. Modes include Reduction, Iteration, arithmetic
+saturation, and Fail-First. More specific details in each
+section and in the [[svp64/appendix]]
+
+* For condition register operations see [[sv/cr_ops]]
+* For LD/ST Modes, see [[sv/ldst]].
+* For Branch modes, see [[sv/branches]]
+* For arithmetic and logical, see [[sv/normal]]
+
+# ELWIDTH Encoding
+
+Default behaviour is set to 0b00 so that zeros follow the convention of
+`scalar identity behaviour`. In this case it means that elwidth overrides
+are not applicable. Thus if a 32 bit instruction operates on 32 bit,
+`elwidth=0b00` specifies that this behaviour is unmodified. Likewise
+when a processor is switched from 64 bit to 32 bit mode, `elwidth=0b00`
+states that, again, the behaviour is not to be modified.
+
+Only when elwidth is nonzero is the element width overridden to the
+explicitly required value.
+
+## Elwidth for Integers:
+
+| Value | Mnemonic | Description |
+|-------|----------------|------------------------------------|
+| 00 | DEFAULT | default behaviour for operation |
+| 01 | `ELWIDTH=w` | Word: 32-bit integer |
+| 10 | `ELWIDTH=h` | Halfword: 16-bit integer |
+| 11 | `ELWIDTH=b` | Byte: 8-bit integer |
+
+This encoding is chosen such that the byte width may be computed as
+`8<<(3-ew)`
+
+## Elwidth for FP Registers:
+
+| Value | Mnemonic | Description |
+|-------|----------------|------------------------------------|
+| 00 | DEFAULT | default behaviour for FP operation |
+| 01 | `ELWIDTH=f32` | 32-bit IEEE 754 Single floating-point |
+| 10 | `ELWIDTH=f16` | 16-bit IEEE 754 Half floating-point |
+| 11 | `ELWIDTH=bf16` | Reserved for `bf16` |
+
+Note:
+[`bf16`](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format)
+is reserved for a future implementation of SV
+
+Note that any IEEE754 FP operation in Power ISA ending in "s" (`fadds`) shall
+perform its operation at **half** the ELWIDTH then padded back out
+to ELWIDTH. `sv.fadds/ew=f32` shall perform an IEEE754 FP16 operation that is then "padded" to fill out to an IEEE754 FP32. When ELWIDTH=DEFAULT
+clearly the behaviour of `sv.fadds` is performed at 32-bit accuracy
+then padded back out to fit in IEEE754 FP64, exactly as for Scalar
+v3.0B "single" FP. Any FP operation ending in "s" where ELWIDTH=f16
+or ELWIDTH=bf16 is reserved and must raise an illegal instruction
+(IEEE754 FP8 or BF8 are not defined).
+
+## Elwidth for CRs:
+
+Element-width overrides for CR Fields has no meaning. The bits
+are therefore used for other purposes, or when Rc=1, the Elwidth
+applies to the result being tested (a GPR or FPR), but not to the
+Vector of CR Fields.
+
+# SUBVL Encoding
+
+the default for SUBVL is 1 and its encoding is 0b00 to indicate that
+SUBVL is effectively disabled (a SUBVL for-loop of only one element). this
+lines up in combination with all other "default is all zeros" behaviour.
+
+| Value | Mnemonic | Subvec | Description |
+|-------|-----------|---------|------------------------|
+| 00 | `SUBVL=1` | single | Sub-vector length of 1 |
+| 01 | `SUBVL=2` | vec2 | Sub-vector length of 2 |
+| 10 | `SUBVL=3` | vec3 | Sub-vector length of 3 |
+| 11 | `SUBVL=4` | vec4 | Sub-vector length of 4 |
+
+The SUBVL encoding value may be thought of as an inclusive range of a
+sub-vector. SUBVL=2 represents a vec2, its encoding is 0b01, therefore
+this may be considered to be elements 0b00 to 0b01 inclusive.
+
+# MASK/MASK_SRC & MASKMODE Encoding
+
+TODO: rename MASK_KIND to MASKMODE
+
+One bit (`MASKMODE`) indicates the mode: CR or Int predication. The two
+types may not be mixed.
+
+Special note: to disable predication this field must
+be set to zero in combination with Integer Predication also being set
+to 0b000. this has the effect of enabling "all 1s" in the predicate
+mask, which is equivalent to "not having any predication at all"
+and consequently, in combination with all other default zeros, fully
+disables SV (`scalar identity behaviour`).
+
+`MASKMODE` may be set to one of 2 values:
+
+| Value | Description |
+|-----------|------------------------------------------------------|
+| 0 | MASK/MASK_SRC are encoded using Integer Predication |
+| 1 | MASK/MASK_SRC are encoded using CR-based Predication |
+
+Integer Twin predication has a second set of 3 bits that uses the same
+encoding thus allowing either the same register (r3, r10 or r31) to be used
+for both src and dest, or different regs (one for src, one for dest).
+
+Likewise CR based twin predication has a second set of 3 bits, allowing
+a different test to be applied.
+
+Note that it is assumed that Predicate Masks (whether INT or CR)
+are read *before* the operations proceed. In practice (for CR Fields)
+this creates an unnecessary block on parallelism. Therefore,
+it is up to the programmer to ensure that the CR fields used as
+Predicate Masks are not being written to by any parallel Vector Loop.
+Doing so results in **UNDEFINED** behaviour, according to the definition
+outlined in the Power ISA v3.0B Specification.
+
+Hardware Implementations are therefore free and clear to delay reading
+of individual CR fields until the actual predicated element operation
+needs to take place, safe in the knowledge that no programmer will
+have issued a Vector Instruction where previous elements could have
+overwritten (destroyed) not-yet-executed CR-Predicated element operations.
+
+## Integer Predication (MASKMODE=0)
+
+When the predicate mode bit is zero the 3 bits are interpreted as below.
+Twin predication has an identical 3 bit field similarly encoded.
+
+`MASK` and `MASK_SRC` may be set to one of 8 values, to provide the following meaning:
+
+| Value | Mnemonic | Element `i` enabled if: |
+|-------|----------|------------------------------|
+| 000 | ALWAYS | predicate effectively all 1s |
+| 001 | 1 << R3 | `i == R3` |
+| 010 | R3 | `R3 & (1 << i)` is non-zero |
+| 011 | ~R3 | `R3 & (1 << i)` is zero |
+| 100 | R10 | `R10 & (1 << i)` is non-zero |
+| 101 | ~R10 | `R10 & (1 << i)` is zero |
+| 110 | R30 | `R30 & (1 << i)` is non-zero |
+| 111 | ~R30 | `R30 & (1 << i)` is zero |
+
+r10 and r30 are at the high end of temporary and unused registers, so as not to interfere with register allocation from ABIs.
+
+## CR-based Predication (MASKMODE=1)
+
+When the predicate mode bit is one the 3 bits are interpreted as below.
+Twin predication has an identical 3 bit field similarly encoded.
+
+`MASK` and `MASK_SRC` may be set to one of 8 values, to provide the following meaning:
+
+| Value | Mnemonic | Element `i` is enabled if |
+|-------|----------|--------------------------|
+| 000 | lt | `CR[offs+i].LT` is set |
+| 001 | nl/ge | `CR[offs+i].LT` is clear |
+| 010 | gt | `CR[offs+i].GT` is set |
+| 011 | ng/le | `CR[offs+i].GT` is clear |
+| 100 | eq | `CR[offs+i].EQ` is set |
+| 101 | ne | `CR[offs+i].EQ` is clear |
+| 110 | so/un | `CR[offs+i].FU` is set |
+| 111 | ns/nu | `CR[offs+i].FU` is clear |
+
+CR based predication. TODO: select alternate CR for twin predication? see
+[[discussion]] Overlap of the two CR based predicates must be taken
+into account, so the starting point for one of them must be suitably
+high, or accept that for twin predication VL must not exceed the range
+where overlap will occur, *or* that they use the same starting point
+but select different *bits* of the same CRs
+
+`offs` is defined as CR32 (4x8) so as to mesh cleanly with Vectorised Rc=1 operations (see below). Rc=1 operations start from CR8 (TBD).
+
+The CR Predicates chosen must start on a boundary that Vectorised
+CR operations can access cleanly, in full.
+With EXTRA2 restricting starting points
+to multiples of 8 (CR0, CR8, CR16...) both Vectorised Rc=1 and CR Predicate
+Masks have to be adapted to fit on these boundaries as well.
+
+# Extra Remapped Encoding <a name="extra_remap"> </a>
+
+Shows all instruction-specific fields in the Remapped Encoding `RM[10:18]` for all instruction variants. Note that due to the very tight space, the encoding mode is *not* included in the prefix itself. The mode is "applied", similar to Power ISA "Forms" (X-Form, D-Form) on a per-instruction basis, and, like "Forms" are given a designation (below) of the form `RM-nP-nSnD`. The full list of which instructions use which remaps is here [[opcode_regs_deduped]]. (*Machine-readable CSV files have been provided which will make the task of creating SV-aware ISA decoders easier*).
+
+These mappings are part of the SVP64 Specification in exactly the same
+way as X-Form, D-Form. New Scalar instructions added to the Power ISA
+will need a corresponding SVP64 Mapping, which can be derived by-rote
+from examining the Register "Profile" of the instruction.
+
+There are two categories: Single and Twin Predication.
+Due to space considerations further subdivision of Single Predication
+is based on whether the number of src operands is 2 or 3. With only
+9 bits available some compromises have to be made.
+
+* `RM-1P-3S1D` Single Predication dest/src1/2/3, applies to 4-operand instructions (fmadd, isel, madd).
+* `RM-1P-2S1D` Single Predication dest/src1/2 applies to 3-operand instructions (src1 src2 dest)
+* `RM-2P-1S1D` Twin Predication (src=1, dest=1)
+* `RM-2P-2S1D` Twin Predication (src=2, dest=1) primarily for LDST (Indexed)
+* `RM-2P-1S2D` Twin Predication (src=1, dest=2) primarily for LDST Update
+
+## RM-1P-3S1D
+
+| Field Name | Field bits | Description |
+|------------|------------|----------------------------------------|
+| Rdest\_EXTRA2 | `10:11` | extends Rdest (R\*\_EXTRA2 Encoding) |
+| Rsrc1\_EXTRA2 | `12:13` | extends Rsrc1 (R\*\_EXTRA2 Encoding) |
+| Rsrc2\_EXTRA2 | `14:15` | extends Rsrc2 (R\*\_EXTRA2 Encoding) |
+| Rsrc3\_EXTRA2 | `16:17` | extends Rsrc3 (R\*\_EXTRA2 Encoding) |
+| EXTRA2_MODE | `18` | used by `divmod2du` and `maddedu` for RS |
+
+These are for 3 operand in and either 1 or 2 out instructions.
+3-in 1-out includes `madd RT,RA,RB,RC`. (DRAFT) instructions
+such as `maddedu` have an implicit second destination, RS, the
+selection of which is determined by bit 18.
+
+## RM-1P-2S1D
+
+| Field Name | Field bits | Description |
+|------------|------------|-------------------------------------------|
+| Rdest\_EXTRA3 | `10:12` | extends Rdest |
+| Rsrc1\_EXTRA3 | `13:15` | extends Rsrc1 |
+| Rsrc2\_EXTRA3 | `16:18` | extends Rsrc3 |
+
+These are for 2 operand 1 dest instructions, such as `add RT, RA,
+RB`. However also included are unusual instructions with an implicit dest
+that is identical to its src reg, such as `rlwinmi`.
+
+Normally, with instructions such as `rlwinmi`, the scalar v3.0B ISA would not have sufficient bit fields to allow
+an alternative destination. With SV however this becomes possible.
+Therefore, the fact that the dest is implicitly also a src should not
+mislead: due to the *prefix* they are different SV regs.
+
+* `rlwimi RA, RS, ...`
+* Rsrc1_EXTRA3 applies to RS as the first src
+* Rsrc2_EXTRA3 applies to RA as the secomd src
+* Rdest_EXTRA3 applies to RA to create an **independent** dest.
+
+With the addition of the EXTRA bits, the three registers
+each may be *independently* made vector or scalar, and be independently
+augmented to 7 bits in length.
+
+## RM-2P-1S1D/2S
+
+| Field Name | Field bits | Description |
+|------------|------------|----------------------------|
+| Rdest_EXTRA3 | `10:12` | extends Rdest |
+| Rsrc1_EXTRA3 | `13:15` | extends Rsrc1 |
+| MASK_SRC | `16:18` | Execution Mask for Source |
+
+`RM-2P-2S` is for `stw` etc. and is Rsrc1 Rsrc2.
+
+## RM-1P-2S1D
+
+single-predicate, three registers (2 read, 1 write)
+
+| Field Name | Field bits | Description |
+|------------|------------|----------------------------|
+| Rdest_EXTRA3 | `10:12` | extends Rdest |
+| Rsrc1_EXTRA3 | `13:15` | extends Rsrc1 |
+| Rsrc2_EXTRA3 | `16:18` | extends Rsrc2 |
+
+## RM-2P-2S1D/1S2D/3S
+
+The primary purpose for this encoding is for Twin Predication on LOAD
+and STORE operations. see [[sv/ldst]] for detailed anslysis.
+
+RM-2P-2S1D:
+
+| Field Name | Field bits | Description |
+|------------|------------|----------------------------|
+| Rdest_EXTRA2 | `10:11` | extends Rdest (R\*\_EXTRA2 Encoding) |
+| Rsrc1_EXTRA2 | `12:13` | extends Rsrc1 (R\*\_EXTRA2 Encoding) |
+| Rsrc2_EXTRA2 | `14:15` | extends Rsrc2 (R\*\_EXTRA2 Encoding) |
+| MASK_SRC | `16:18` | Execution Mask for Source |
+
+Note that for 1S2P the EXTRA2 dest and src names are switched (Rsrc_EXTRA2
+is in bits 10:11, Rdest1_EXTRA2 in 12:13)
+
+Also that for 3S (to cover `stdx` etc.) the names are switched to 3 src: Rsrc1_EXTRA2, Rsrc2_EXTRA2, Rsrc3_EXTRA2.
+
+Note also that LD with update indexed, which takes 2 src and 2 dest
+(e.g. `lhaux RT,RA,RB`), does not have room for 4 registers and also
+Twin Predication. therefore these are treated as RM-2P-2S1D and the
+src spec for RA is also used for the same RA as a dest.
+
+Note that if ELWIDTH != ELWIDTH_SRC this may result in reduced performance or increased latency in some implementations due to lane-crossing.
+
+# R\*\_EXTRA2/3
+
+EXTRA is the means by which two things are achieved:
+
+1. Registers are marked as either Vector *or Scalar*
+2. Register field numbers (limited typically to 5 bit)
+ are extended in range, both for Scalar and Vector.
+
+The register files are therefore extended:
+
+* INT is extended from r0-31 to r0-127
+* FP is extended from fp0-32 to fp0-fp127
+* CR Fields are extended from CR0-7 to CR0-127
+
+However due to pressure in `RM.EXTRA` not all these registers
+are accessible by all instructions, particularly those with
+a large number of operands (`madd`, `isel`).
+
+In the following tables register numbers are constructed from the
+standard v3.0B / v3.1B 32 bit register field (RA, FRA) and the EXTRA2
+or EXTRA3 field from the SV Prefix, determined by the specific
+RM-xx-yyyy designation for a given instruction.
+The prefixing is arranged so that
+interoperability between prefixing and nonprefixing of scalar registers
+is direct and convenient (when the EXTRA field is all zeros).
+
+A pseudocode algorithm explains the relationship, for INT/FP (see [[svp64/appendix]] for CRs)
+
+```
+ if extra3_mode:
+ spec = EXTRA3
+ else:
+ spec = EXTRA2 << 1 # same as EXTRA3, shifted
+ if spec[0]: # vector
+ return (RA << 2) | spec[1:2]
+ else: # scalar
+ return (spec[1:2] << 5) | RA
+```
+
+Future versions may extend to 256 by shifting Vector numbering up.
+Scalar will not be altered.
+
+Note that in some cases the range of starting points for Vectors
+is limited.
+
+## INT/FP EXTRA3
+
+If EXTRA3 is zero, maps to
+"scalar identity" (scalar Power ISA field naming).
+
+Fields are as follows:
+
+* Value: R_EXTRA3
+* Mode: register is tagged as scalar or vector
+* Range/Inc: the range of registers accessible from this EXTRA
+ encoding, and the "increment" (accessibility). "/4" means
+ that this EXTRA encoding may only give access (starting point)
+ every 4th register.
+* MSB..LSB: the bit field showing how the register opcode field
+ combines with EXTRA to give (extend) the register number (GPR)
+
+| Value | Mode | Range/Inc | 6..0 |
+|-----------|-------|---------------|---------------------|
+| 000 | Scalar | `r0-r31`/1 | `0b00 RA` |
+| 001 | Scalar | `r32-r63`/1 | `0b01 RA` |
+| 010 | Scalar | `r64-r95`/1 | `0b10 RA` |
+| 011 | Scalar | `r96-r127`/1 | `0b11 RA` |
+| 100 | Vector | `r0-r124`/4 | `RA 0b00` |
+| 101 | Vector | `r1-r125`/4 | `RA 0b01` |
+| 110 | Vector | `r2-r126`/4 | `RA 0b10` |
+| 111 | Vector | `r3-r127`/4 | `RA 0b11` |
+
+## INT/FP EXTRA2
+
+If EXTRA2 is zero will map to
+"scalar identity behaviour" i.e Scalar Power ISA register naming:
+
+| Value | Mode | Range/inc | 6..0 |
+|-----------|-------|---------------|-----------|
+| 00 | Scalar | `r0-r31`/1 | `0b00 RA` |
+| 01 | Scalar | `r32-r63`/1 | `0b01 RA` |
+| 10 | Vector | `r0-r124`/4 | `RA 0b00` |
+| 11 | Vector | `r2-r126`/4 | `RA 0b10` |
+
+**Note that unlike in EXTRA3, in EXTRA2**:
+
+* the GPR Vectors may only start from
+ `r0, r2, r4, r6, r8` and likewise FPR Vectors.
+* the GPR Scalars may only go from `r0, r1, r2.. r63` and likewise FPR Scalars.
+
+as there is insufficient bits to cover the full range.
+
+## CR Field EXTRA3
+
+CR Field encoding is essentially the same but made more complex due to CRs being bit-based. See [[svp64/appendix]] for explanation and pseudocode.
+Note that Vectors may only start from `CR0, CR4, CR8, CR12, CR16, CR20`...
+and Scalars may only go from `CR0, CR1, ... CR31`
+
+Encoding shown MSB down to LSB
+
+For a 5-bit operand (BA, BB, BT):
+
+| Value | Mode | Range/Inc | 8..5 | 4..2 | 1..0 |
+|-------|------|---------------|-----------| --------|---------|
+| 000 | Scalar | `CR0-CR7`/1 | 0b0000 | BA[4:2] | BA[1:0] |
+| 001 | Scalar | `CR8-CR15`/1 | 0b0001 | BA[4:2] | BA[1:0] |
+| 010 | Scalar | `CR16-CR23`/1 | 0b0010 | BA[4:2] | BA[1:0] |
+| 011 | Scalar | `CR24-CR31`/1 | 0b0011 | BA[4:2] | BA[1:0] |
+| 100 | Vector | `CR0-CR112`/16 | BA[4:2] 0 | 0b000 | BA[1:0] |
+| 101 | Vector | `CR4-CR116`/16 | BA[4:2] 0 | 0b100 | BA[1:0] |
+| 110 | Vector | `CR8-CR120`/16 | BA[4:2] 1 | 0b000 | BA[1:0] |
+| 111 | Vector | `CR12-CR124`/16 | BA[4:2] 1 | 0b100 | BA[1:0] |
+
+For a 3-bit operand (e.g. BFA):
+
+| Value | Mode | Range/Inc | 6..3 | 2..0 |
+|-------|------|---------------|-----------| --------|
+| 000 | Scalar | `CR0-CR7`/1 | 0b0000 | BFA |
+| 001 | Scalar | `CR8-CR15`/1 | 0b0001 | BFA |
+| 010 | Scalar | `CR16-CR23`/1 | 0b0010 | BFA |
+| 011 | Scalar | `CR24-CR31`/1 | 0b0011 | BFA |
+| 100 | Vector | `CR0-CR112`/16 | BFA 0 | 0b000 |
+| 101 | Vector | `CR4-CR116`/16 | BFA 0 | 0b100 |
+| 110 | Vector | `CR8-CR120`/16 | BFA 1 | 0b000 |
+| 111 | Vector | `CR12-CR124`/16 | BFA 1 | 0b100 |
+
+## CR EXTRA2
+
+CR encoding is essentially the same but made more complex due to CRs being bit-based. See separate section for explanation and pseudocode.
+Note that Vectors may only start from CR0, CR8, CR16, CR24, CR32...
+
+
+Encoding shown MSB down to LSB
+
+For a 5-bit operand (BA, BB, BC):
+
+| Value | Mode | Range/Inc | 8..5 | 4..2 | 1..0 |
+|-------|--------|----------------|---------|---------|---------|
+| 00 | Scalar | `CR0-CR7`/1 | 0b0000 | BA[4:2] | BA[1:0] |
+| 01 | Scalar | `CR8-CR15`/1 | 0b0001 | BA[4:2] | BA[1:0] |
+| 10 | Vector | `CR0-CR112`/16 | BA[4:2] 0 | 0b000 | BA[1:0] |
+| 11 | Vector | `CR8-CR120`/16 | BA[4:2] 1 | 0b000 | BA[1:0] |
+
+For a 3-bit operand (e.g. BFA):
+
+| Value | Mode | Range/Inc | 6..3 | 2..0 |
+|-------|------|---------------|-----------| --------|
+| 00 | Scalar | `CR0-CR7`/1 | 0b0000 | BFA |
+| 01 | Scalar | `CR8-CR15`/1 | 0b0001 | BFA |
+| 10 | Vector | `CR0-CR112`/16 | BFA 0 | 0b000 |
+| 11 | Vector | `CR8-CR120`/16 | BFA 1 | 0b000 |
+
+# Appendix
+
+Now at its own page: [[svp64/appendix]]
-TODO