From 3f1499bc0a4ca4bebcfdf243bf5729d67a1c16ba Mon Sep 17 00:00:00 2001 From: Luke Kenneth Casson Leighton Date: Sun, 2 Apr 2023 19:48:48 +0100 Subject: [PATCH] pull in edits from ls010 into svp64.mdwn --- openpower/sv/svp64.mdwn | 921 ++++++++++++++++++++++++++++------------ 1 file changed, 654 insertions(+), 267 deletions(-) diff --git a/openpower/sv/svp64.mdwn b/openpower/sv/svp64.mdwn index bcdeec574..829474e20 100644 --- a/openpower/sv/svp64.mdwn +++ b/openpower/sv/svp64.mdwn @@ -1,6 +1,4 @@ -[[!tag standards]] - -# DRAFT SVP64 for Power ISA v3.0B +# RFC ls010 SVP64 Zero-Overhead Loop Prefix Subsystem * **DRAFT STATUS v0.1 18sep2021** Release notes @@ -30,8 +28,10 @@ Links: * * * TODO elwidth "infinite" discussion -* Saturating description. +* Saturating description. * TODO [[sv/svp64-single]] +* External RFC ls010 + Table of contents @@ -39,161 +39,485 @@ Table of contents # Introduction -This document focuses on the encoding of [[SV|sv]], and assumes familiarity with the same. It does not cover how SV works (merely the instruction encoding), and is therefore best read in conjunction with the [[sv/overview]], as well as the [[sv/svp64_quirks]] section. -It is also crucial to note that whilst this format augments instruction -behaviour it works in conjunction with SVSTATE and other [[sv/sprs]]. - -All bit numbers are in MSB0 form (the bits are numbered from 0 at the MSB -on the left -and counting up as you move rightwards to the LSB end). All bit ranges are inclusive -(so `4:6` means bits 4, 5, and 6, in MSB0 order). - -64-bit instructions are split into two 32-bit words, the prefix and the -suffix. The prefix always comes before the suffix in PC order. - -| 0:5 | 6:31 | 32:63 | -|--------|--------------|--------------| -| EXT01 | v3.1 Prefix | v3.0/1 Suffix | - -svp64 fits into the "reserved" portions of the v3.1 prefix, making it possible for svp64, v3.0B (or v3.1 including 64 bit prefixed) instructions to co-exist in the same binary without conflict. +Simple-V is a type of Vectorisation best described as a "Prefix Loop +Subsystem" similar to the 5 decades-old Zilog Z80 `LDIR` instruction and +to the 8086 `REP` Prefix instruction. More advanced features are similar +to the Z80 `CPIR` instruction. If viewed one-dimensionally as an actual +Vector ISA it introduces over 1.5 million 64-bit Vector instructions. +SVP64, the instruction format used by Simple-V, is therefore best viewed +as an orthogonal RISC-paradigm "Prefixing" subsystem instead. + +Except where explicitly stated all bit numbers remain as in the rest of +the Power ISA: in MSB0 form (the bits are numbered from 0 at the MSB on +the left and counting up as you move rightwards to the LSB end). All bit +ranges are inclusive (so `4:6` means bits 4, 5, and 6, in MSB0 order). +**All register numbering and element numbering however is LSB0 ordering** +which is a different convention from that used elsewhere in the Power ISA. + +The SVP64 prefix always comes before the suffix in PC order and must be +considered an independent "Defined word" that augments the behaviour of +the following instruction, but does **not** change the actual Decoding +of that following instruction. **All prefixed instructions retain their +non-prefixed encoding and definition**. + +Two apparent exceptions to the above hard rule exist: SV Branch-Conditional +operations and LD/ST-update "Post-Increment" Mode. Post-Increment +was considered sufficiently high priority (significantly reducing hot-loop +instruction count) that one bit in the Prefix is reserved for it. +Vectorised Branch-Conditional operations "embed" the original Scalar +Branch-Conditional behaviour into a much more advanced variant that +is highly suited to High-Performance Computation (HPC), Supercomputing, +and parallel GPU Workloads. + +*Architectural Resource Allocation note: it is prohibited to accept RFCs +which fundamentally violate this hard requirement. Under no circumstances +must the Suffix space have an alternate instruction encoding allocated +within SVP64 that is entirely different from the non-prefixed Defined +Word. Hardware Implementors critically rely on this inviolate guarantee +to implement High-Performance Multi-Issue micro-architectures that can +sustain 100% throughput* Subset implementations in hardware are permitted, as long as certain rules are followed, allowing for full soft-emulation including future -revisions. Details in the [[svp64/appendix]]. +revisions. Compliancy Subsets exist to ensure minimum levels of binary +interoperability expectations within certain environments. Details in +the [[svp64/appendix]]. ## SVP64 encoding features -A number of features need to be compacted into a very small space of only 24 bits: +A number of features need to be compacted into a very small space of +only 24 bits: -* Independent per-register Scalar/Vector tagging and range extension on every register +* Independent per-register Scalar/Vector tagging and range extension on + every register * Element width overrides on both source and destination * Predication on both source and destination * Two different sources of predication: INT and CR Fields -* SV Modes including saturation (for Audio, Video and DSP), mapreduce, fail-first and - predicate-result mode. +* SV Modes including saturation (for Audio, Video and DSP), mapreduce, + fail-first and predicate-result mode. -This document focusses specifically on how that fits into available space. The [[svp64/appendix]] explains more of the details, whilst the [[sv/overview]] gives the basics. +Different classes of operations require different formats. The earlier +sections cover the common formats and the four separate modes follow: +CR operations (crops), Arithmetic/Logical (termed "normal"), Load/Store +and Branch-Conditional. -# Definition of Reserved in this spec. +## Definition of Reserved in this spec. For the new fields added in SVP64, instructions that have any of their fields set to a reserved value must cause an illegal instruction trap, -to allow emulation of future instruction sets, or for subsets of SVP64 -to be implemented in hardware and the rest emulated. -This includes SVP64 SPRs: reading or writing values which are not -supported in hardware must also raise illegal instruction traps -in order to allow emulation. +to allow emulation of future instruction sets, or for subsets of SVP64 to +be implemented in hardware and the rest emulated. This includes SVP64 +SPRs: reading or writing values which are not supported in hardware +must also raise illegal instruction traps in order to allow emulation. Unless otherwise stated, reserved values are always all zeros. -This is unlike OpenPower ISA v3.1, which in many instances does not require a trap if reserved fields are nonzero. Where the standard Power ISA definition -is intended the red keyword `RESERVED` is used. - -# Scalar Identity Behaviour - -SVP64 is designed so that when the prefix is all zeros, and - VL=1, no effect or -influence occurs (no augmentation) such that all standard Power ISA -v3.0/v3 1 instructions covered by the prefix are "unaltered". This is termed `scalar identity behaviour` (based on the mathematical definition for "identity", as in, "identity matrix" or better "identity transformation"). - -Note that this is completely different from when VL=0. VL=0 turns all operations under its influence into `nops` (regardless of the prefix) - whereas when VL=1 and the SV prefix is all zeros, the operation simply acts as if SV had not been applied at all to the instruction (an "identity transformation"). - -# Register Naming and size - -SV Registers are simply the INT, FP and CR register files extended -linearly to larger sizes; SV Vectorisation iterates sequentially through these registers. - -Where the integer regfile in standard scalar -Power ISA v3.0B/v3.1B is r0 to r31, SV extends this as r0 to r127. -Likewise FP registers are extended to 128 (fp0 to fp127), and CR Fields -are -extended to 128 entries, CR0 thru CR127. +This is unlike OpenPower ISA v3.1, which in many instances does not +require a trap if reserved fields are nonzero. Where the standard Power +ISA definition is intended the red keyword `RESERVED` is used. + +## Definition of "UnVectoriseable" + +Any operation that inherently makes no sense if repeated is termed +"UnVectoriseable" or "UnVectorised". Examples include `sc` or `sync` +which have no registers. `mtmsr` is also classed as UnVectoriseable +because there is only one `MSR`. + +## Register files, elements, and Element-width Overrides + +In the Upper Compliancy Levels of SVP64 the size of the GPR and FPR +Register files are expanded from 32 to 128 entries, and the number of +CR Fields expanded from CR0-CR7 to CR0-CR127. (Note: A future version +of SVP64 is anticipated to extend the VSR register file). + +Memory access remains exactly the same: the effects of `MSR.LE` remain +exactly the same, affecting as they already do and remain **only** +on the Load and Store memory-register operation byte-order, and having +nothing to do with the ordering of the contents of register files or +register-register operations. + +To be absolutely clear: + +``` + There are no conceptual arithmetic ordering or other changes over the + Scalar Power ISA definitions to registers or register files or to + arithmetic or Logical Operations beyond element-width subdivision +``` + +Element offset +numbering is naturally **LSB0-sequentially-incrementing from zero, not +MSB0-incrementing** including when element-width overrides are used, +at which point the elements progress through each register +sequentially from the LSB end +(confusingly numbered the highest in MSB0 ordering) and progress +incrementally to the MSB end (confusingly numbered the lowest in +MSB0 ordering). + +When exclusively using MSB0-numbering, SVP64 +becomes unnecessarily complex to both express and subsequently understand: +the required conditional subtractions from 63, +31, 15 and 7 needed to express the fact that elements are LSB0-sequential +unfortunately become a hostile minefield, obscuring both +intent and meaning. Therefore for the +purposes of this section the more natural **LSB0 numbering is assumed** +and it is left to the reader to translate to MSB0 numbering. + +The Canonical specification for how element-sequential numbering and +element-width overrides is defined is expressed in the following c +structure, assuming a Little-Endian system, and naturally using LSB0 +numbering everywhere because the ANSI c specification is inherently LSB0. +Note the deliberate similarity to how VSX register elements are defined: + +``` + #pragma pack + typedef union { + uint8_t bytes[]; // elwidth 8 + uint16_t hwords[]; // elwidth 16 + uint32_t words[]; // elwidth 32 + uint64_t dwords[]; // elwidth 64 + uint8_t actual_bytes[8]; + } el_reg_t; + + elreg_t int_regfile[128]; + + void get_register_element(el_reg_t* el, int gpr, int element, int width) { + switch (width) { + case 64: el->dwords[0] = int_regfile[gpr].dwords[element]; + case 32: el->words[0] = int_regfile[gpr].words[element]; + case 16: el->hwords[0] = int_regfile[gpr].hwords[element]; + case 8 : el->bytes[0] = int_regfile[gpr].bytes[element]; + } + } + void set_register_element(el_reg_t* el, int gpr, int element, int width) { + switch (width) { + case 64: int_regfile[gpr].dwords[element] = el->dwords[0]; + case 32: int_regfile[gpr].words[element] = el->words[0]; + case 16: int_regfile[gpr].hwords[element] = el->hwords[0]; + case 8 : int_regfile[gpr].bytes[element] = el->bytes[0]; + } + } +``` + +Example Vector-looped add operation implementation when elwidths are 64-bit: + +``` + # vector-add RT, RA,RB using the "uint64_t" union member, "dwords" + for i in range(VL): + int_regfile[RT].dword[i] = int_regfile[RA].dword[i] + int_regfile[RB].dword[i] +``` + +However if elwidth overrides are set to 16 for both source and destination: + +``` + # vector-add RT, RA, RB using the "uint64_t" union member "halfs" + for i in range(VL): + int_regfile[RT].halfs[i] = int_regfile[RA].halfs[i] + int_regfile[RB].halfs[i] +``` + +Hardware Architectural note: to avoid a Read-Modify-Write at the register +file it is strongly recommended to implement byte-level write-enable lines +exactly as has been implemented in DRAM ICs for many decades. Additionally +the predicate mask bit is advised to be associated with the element +operation and alongside the result ultimately passed to the register file. +When element-width is set to 64-bit the relevant predicate mask bit +may be repeated eight times and pull all eight write-port byte-level +lines HIGH. Clearly when element-width is set to 8-bit the relevant +predicate mask bit corresponds directly with one single byte-level +write-enable line. It is up to the Hardware Architect to then amortise +(merge) elements together into both PredicatedSIMD Pipelines as well +as simultaneous non-overlapping Register File writes, to achieve High +Performance designs. + +**Comparative equivalent using VSR registers** + +For a comparative data point the VSR Registers may be expressed in the +same fashion. The c code below is directly an expression of Figure 97 in +Power ISA Public v3.1 Book I Section 6.3 page 258, *after compensating for +MSB0 numbering in both bits and elements, adapting in full to LSB0 numbering, +and obeying LE ordering*. + +**Crucial to understanding why the subtraction from 1,3,7,15 is present +is because VSX Registers number elements also in MSB0 order**. SVP64 +very specifically numbers elements in **LSB0** order with the first +element being at the **LSB** end of the register, where VSX places +the numerically-lowest element at the **MSB** end of the register. + +``` + #pragma pack + typedef union { + uint8_t bytes[16]; // elwidth 8, QTY 16 FIXED total + uint16_t hwords[8]; // elwidth 16, QTY 8 FIXED total + uint32_t words[4]; // elwidth 32, QTY 8 FIXED total + uint64_t dwords[2]; // elwidth 64, QTY 2 FIXED total + uint8_t actual_bytes[16]; // totals 128-bit + } el_reg_t; + + elreg_t VSR_regfile[64]; + + static void check_num_elements(int elt, int width) { + switch (width) { + case 64: assert elt < 2; + case 32: assert elt < 4; + case 16: assert elt < 8; + case 8 : assert elt < 16; + } + } + void get_VSR_element(el_reg_t* el, int gpr, int elt, int width) { + check_num_elements(elt, width); + switch (width) { + case 64: el->dwords[0] = VSR_regfile[gpr].dwords[1-elt]; + case 32: el->words[0] = VSR_regfile[gpr].words[3-elt]; + case 16: el->hwords[0] = VSR_regfile[gpr].hwords[7-elt]; + case 8 : el->bytes[0] = VSR_regfile[gpr].bytes[15-elt]; + } + } + void set_VSR_element(el_reg_t* el, int gpr, int elt, int width) { + check_num_elements(elt, width); + switch (width) { + case 64: VSR_regfile[gpr].dwords[1-elt] = el->dwords[0]; + case 32: VSR_regfile[gpr].words[3-elt] = el->words[0]; + case 16: VSR_regfile[gpr].hwords[7-elt] = el->hwords[0]; + case 8 : VSR_regfile[gpr].bytes[15-elt] = el->bytes[0]; + } + } +``` + +For VSR Registers one key difference is that the overlay of different element +widths is clearly a *bounded static quantity*, whereas for Simple-V the +elements are +unrestrained and permitted to flow into *successive underlying Scalar registers*. +This difference is absolutely critical to a full understanding of the entire +Simple-V paradigm and why element-ordering, bit-numbering *and register numbering* +are all so strictly defined. + +Implementations are not permitted to violate the Canonical definition. Software +will be critically relying on the wrapped (overflow) behaviour inherently +implied by the unbounded variable-length c arrays. + +Illustrating the exact same loop with the exact same effect as achieved by Simple-V +we are first forced to create wrapper functions, to cater for the fact +that VSR register elements are static bounded: + +``` + int calc_VSR_reg_offs(int elt, int width) { + switch (width) { + case 64: return floor(elt / 2); + case 32: return floor(elt / 4); + case 16: return floor(elt / 8); + case 8 : return floor(elt / 16); + } + } + int calc_VSR_elt_offs(int elt, int width) { + switch (width) { + case 64: return (elt % 2); + case 32: return (elt % 4); + case 16: return (elt % 8); + case 8 : return (elt % 16); + } + } + void _set_VSR_element(el_reg_t* el, int gpr, int elt, int width) { + int new_elt = calc_VSR_elt_offs(elt, width); + int new_reg = calc_VSR_reg_offs(elt, width); + set_VSR_element(el, gpr+new_reg, new_elt, width); + } +``` + +And finally use these functions: + +``` + # VSX-add RT, RA, RB using the "uint64_t" union member "halfs" + for i in range(VL): + el_reg_t result, ra, rb; + _get_VSR_element(&ra, RA, i, 16); + _get_VSR_element(&rb, RB, i, 16); + result.halfs[0] = ra.halfs[0] + rb.halfs[0]; // use array 0 elements + _set_VSR_element(&result, RT, i, 16); + +``` + +## Scalar Identity Behaviour + +SVP64 is designed so that when the prefix is all zeros, and VL=1, no +effect or influence occurs (no augmentation) such that all standard Power +ISA v3.0/v3.1 instructions covered by the prefix are "unaltered". This +is termed `scalar identity behaviour` (based on the mathematical +definition for "identity", as in, "identity matrix" or better "identity +transformation"). + +Note that this is completely different from when VL=0. VL=0 turns all +operations under its influence into `nops` (regardless of the prefix) +whereas when VL=1 and the SV prefix is all zeros, the operation simply +acts as if SV had not been applied at all to the instruction (an +"identity transformation"). + +The fact that `VL` is dynamic and can be set to any value at runtime based +on program conditions and behaviour means very specifically that +`scalar identity behaviour` is **not** a redundant encoding. If the +only means by which VL could be set was by way of static-compiled +immediates then this assertion would be false. VL should not +be confused with MAXVL when understanding this key aspect of SimpleV. + +## Register Naming and size + +As indicated above SV Registers are simply the GPR, FPR and CR +register files extended linearly to larger sizes; SV Vectorisation +iterates sequentially through these registers (LSB0 sequential ordering +from 0 to VL-1). + +Where the integer regfile in standard scalar Power ISA v3.0B/v3.1B is +r0 to r31, SV extends this as r0 to r127. Likewise FP registers are +extended to 128 (fp0 to fp127), and CR Fields are extended to 128 entries, +CR0 thru CR127. The names of the registers therefore reflects a simple linear extension of the Power ISA v3.0B / v3.1B register naming, and in hardware this would be reflected by a linear increase in the size of the underlying SRAM used for the regfiles. -Note: when an EXTRA field (defined below) is zero, SV is deliberately designed -so that the register fields are identical to as if SV was not in effect -i.e. under these circumstances (EXTRA=0) the register field names RA, -RB etc. are interpreted and treated as v3.0B / v3.1B scalar registers. This is part of -`scalar identity behaviour` described above. +Note: when an EXTRA field (defined below) is zero, SV is deliberately +designed so that the register fields are identical to as if SV was not in +effect i.e. under these circumstances (EXTRA=0) the register field names +RA, RB etc. are interpreted and treated as v3.0B / v3.1B scalar registers. +This is part of `scalar identity behaviour` described above. + +**Condition Register(s)** + +The Scalar Power ISA Condition Register is a 64 bit register where the top +32 MSBs (numbered 0:31 in MSB0 numbering) are not used. This convention is +*preserved* +in SVP64 and an additional 15 Condition Registers provided in +order to store the new CR Fields, CR8-CR15, CR16-CR23 etc. sequentially. +The top 32 MSBs in each new SVP64 Condition Register are *also* not used: +only the bottom 32 bits (numbered 32:63 in MSB0 numbering). + +*Programmer's note: using `sv.mfcr` without element-width overrides +to take into account the fact that the top 32 MSBs are zero and thus +effectively doubling the number of GPR registers required to hold all 128 +CR Fields would seem the only option because normally elwidth overrides +would halve the capacity of the instruction. However in this case it +is possible to use destination element-width overrides (for `sv.mfcr`. +source overrides would be used on the GPR of `sv.mtocrf`), whereupon +truncation of the 64-bit Condition Register(s) occurs, throwing away +the zeros and storing the remaining (valid, desired) 32-bit values +sequentially into (LSB0-convention) lower-numbered and upper-numbered +halves of GPRs respectively. The programmer is expected to be aware +however that the full width of the entire 64-bit Condition Register +is considered to be "an element". This is **not** like any other +Condition-Register instructions because all other CR instructions, +on closer investigation, will be observed to all be CR-bit or CR-Field +related. Thus a `VL` of 16 must be used* ## Future expansion. With the way that EXTRA fields are defined and applied to register fields, -future versions of SV may involve 256 or greater registers. Backwards binary compatibility may be achieved with a PCR bit (Program Compatibility Register). Further discussion is out of scope for this version of SVP64. +future versions of SV may involve 256 or greater registers. Backwards +binary compatibility may be achieved with a PCR bit (Program Compatibility +Register) or an MSR bit analogous to SF. +Further discussion is out of scope for this version of SVP64. + +Additionally, a future variant of SVP64 will be applied to the Scalar +(Quad-precision and 128-bit) VSX instructions. Element-width overrides +are an opportunity to expand a future version of the Power ISA +to 256-bit, 512-bit and +1024-bit operations, as well as doubling or quadrupling the number +of VSX registers to 128 or 256. Again further discussion is out of +scope for this version of SVP64. + +-------- + +\newpage{} + +# New 64-bit Instruction Encoding spaces + +The following seven new areas are defined within Primary Opcode 9 (EXT009) +as a new 64-bit encoding space, alongside Primary Opcode 1 +(EXT1xx). + +| 0-5 | 6 | 7 | 8-31 | 32| Description | +|-----|---|---|-------|---|------------------------------------| +| PO | 0 | x | xxxx | 0 | `RESERVED2` (57-bit) | +| PO | 0 | 0 | !zero | 1 | SVP64Single:EXT232-263, or `RESERVED3` | +| PO | 0 | 0 | 0000 | 1 | Scalar EXT232-263 | +| PO | 0 | 1 | nnnn | 1 | SVP64:EXT232-263 | +| PO | 1 | 0 | 0000 | x | `RESERVED1` (32-bit) | +| PO | 1 | 0 | !zero | n | SVP64Single:EXT000-063 or `RESERVED4` | +| PO | 1 | 1 | nnnn | n | SVP64:EXT000-063 | + +Note that for the future SVP64Single Encoding (currently RESERVED3 and 4) +it is prohibited to have bits 8-31 be zero, unlike for SVP64 Vector space, +for which bits 8-31 can be zero (termed `scalar identity behaviour`). This +prohibition allows SVP64Single to share its Encoding space with Scalar +Ext232-263 and Scalar EXT300-363. + +Also that RESERVED1 and 2 are candidates for future Major opcode +areas EXT200-231 and EXT300-363 respectively, however as RESERVED areas +they may equally be allocated entirely differently. + +*Architectural Resource Allocation Note: **under no circumstances** must +different Defined Words be allocated within any `EXT{z}` prefixed +or unprefixed space for a given value of `z`. Even if UnVectoriseable +an instruction Defined Word space must have the exact same Instruction +and exact same Instruction Encoding in all spaces (including +being RESERVED if UnVectoriseable) or not be allocated at all. +This is required as an inviolate hard rule governing Primary Opcode 9 +that may not be revoked under any circumstances. A useful way to think +of this is that the Prefix Encoding is, like the 8086 REP instruction, +an independent 32-bit Defined Word. The only semi-exceptions are +the Post-Increment Mode of LD/ST-Update and Vectorised Branch-Conditional.* + +Encoding spaces and their potential are illustrated: + +| Encoding | Available bits | Scalar | Vectoriseable | SVP64Single | +|----------|----------------|--------|---------------|--------------| +|EXT000-063| 32 | yes | yes |yes | +|EXT100-163| 64 | yes | no |no | +|RESERVED2 | 57 | N/A |not applicable |not applicable| +|EXT232-263| 32 | yes | yes |yes | +|RESERVED1 | 32 | N/A | no |no | + +Notes: + +* Prefixed-Prefixed (96-bit) instructions are prohibited. EXT1xx is + thus inherently UnVectoriseable as the EXT1xx prefix is 32-bit + on top of an SVP64 prefix which is 32-bit on top of a Defined Word + and the complexity at the Decoder becomes too great for High + Performance Multi-Issue systems. +* RESERVED2 presently remains unallocated as of yet and therefore its + potential is not yet defined (Not Applicable). +* RESERVED1 is also unallocated at present, but it is known in advance + that the area is UnVectoriseable and also cannot be Prefixed with + SVP64Single. +* Considerable care is needed both on Architectural Resource Allocation + as well as instruction design itself. Once an instruction is allocated + in an UnVectoriseable area it can never be Vectorised without providing + an entirely new Encoding. # Remapped Encoding (`RM[0:23]`) -To allow relatively easy remapping of which portions of the Prefix Opcode -Map are used for SVP64 without needing to rewrite a large portion of the -SVP64 spec, a mapping is defined from the OpenPower v3.1 prefix bits to -a new 24-bit Remapped Encoding denoted `RM[0]` at the MSB to `RM[23]` -at the LSB. +In the SVP64 Vector Prefix spaces, the 24 bits 8-31 are termed `RM`. Bits +32-37 are the Primary Opcode of the Suffix "Defined Word". 38-63 are the +remainder of the Defined Word. Note that the new EXT232-263 SVP64 area +it is obviously mandatory that bit 32 is required to be set to 1. -The mapping from the OpenPower v3.1 prefix bits to the Remapped Encoding -is defined in the Prefix Fields section. +| 0-5 | 6 | 7 | 8-31 | 32-37 | 38-64 |Description | +|-----|---|---|----------|--------|----------|-----------------------| +| PO | 0 | 1 | RM[0:23] | 1nnnnn | xxxxxxxx | SVP64:EXT232-263 | +| PO | 1 | 1 | RM[0:23] | nnnnnn | xxxxxxxx | SVP64:EXT000-063 | -## Prefix Opcode Map (64-bit instruction encoding) - -In the original table in the v3.1B Power ISA Spec on p1350, Table 12, prefix bits 6:11 are shown, with their allocations to different v3.1B pregix "modes". - -The table below hows both PowerISA v3.1 instructions as well as new SVP instructions fit; -empty spaces are yet-to-be-allocated Illegal Instructions. - -| 6:11 | ---000 | ---001 | ---010 | ---011 | ---100 | ---101 | ---110 | ---111 | -|------|--------|--------|--------|--------|--------|--------|--------|--------| -|000---| 8LS | 8LS | 8LS | 8LS | 8LS | 8LS | 8LS | 8LS | -|001---| | | | | | | | | -|010---| 8RR | | | | `SVP64`| `SVP64`| `SVP64`| `SVP64`| -|011---| | | | | `SVP64`| `SVP64`| `SVP64`| `SVP64`| -|100---| MLS | MLS | MLS | MLS | MLS | MLS | MLS | MLS | -|101---| | | | | | | | | -|110---| MRR | | | | `SVP64`| `SVP64`| `SVP64`| `SVP64`| -|111---| | MMIRR | | | `SVP64`| `SVP64`| `SVP64`| `SVP64`| - -Note that by taking up a block of 16, where in every case bits 7 and 9 are set, this allows svp64 to utilise four bits of the v3.1B Prefix space and "allocate" them to svp64's Remapped Encoding field, instead. - -## Prefix Fields - -To "activate" svp64 (in a way that does not conflict with v3.1B 64 bit Prefix mode), fields within the v3.1B Prefix Opcode Map are set -(see Prefix Opcode Map, above), leaving 24 bits "free" for use by SV. -This is achieved by setting bits 7 and 9 to 1: - -| Name | Bits | Value | Description | -|------------|---------|-------|--------------------------------| -| EXT01 | `0:5` | `1` | Indicates Prefixed 64-bit | -| `RM[0]` | `6` | | Bit 0 of Remapped Encoding | -| SVP64_7 | `7` | `1` | Indicates this is SVP64 | -| `RM[1]` | `8` | | Bit 1 of Remapped Encoding | -| SVP64_9 | `9` | `1` | Indicates this is SVP64 | -| `RM[2:23]` | `10:31` | | Bits 2-23 of Remapped Encoding | - -Laid out bitwise, this is as follows, showing how the 32-bits of the prefix -are constructed: - -| 0:5 | 6 | 7 | 8 | 9 | 10:31 | -|--------|-------|---|-------|---|----------| -| EXT01 | RM | 1 | RM | 1 | RM | -| 000001 | RM[0] | 1 | RM[1] | 1 | RM[2:23] | - -Following the prefix will be the suffix: this is simply a 32-bit v3.0B / v3.1 -instruction. That instruction becomes "prefixed" with the SVP context: the -Remapped Encoding field (RM). - -It is important to note that unlike v3.1 64-bit prefixed instructions +It is important to note that unlike EXT1xx 64-bit prefixed instructions there is insufficient space in `RM` to provide identification of -any SVP64 Fields without first partially decoding the -32-bit suffix. Similar to the "Forms" (X-Form, D-Form) the -`RM` format is individually associated with every instruction. +any SVP64 Fields without first partially decoding the 32-bit suffix. +Similar to the "Forms" (X-Form, D-Form) the `RM` format is individually +associated with every instruction. However this still does not adversely +affect Multi-Issue Decoding because the identification of the *length* +of anything in the 64-bit space has been kept brutally simple (EXT009), +and further decoding of any number of 64-bit Encodings in parallel at +that point is fully independent. -Extreme caution and care must therefore be taken -when extending SVP64 in future, to not create unnecessary relationships -between prefix and suffix that could complicate decoding, adding latency. +Extreme caution and care must be taken when extending SVP64 +in future, to not create unnecessary relationships between prefix and +suffix that could complicate decoding, adding latency. -# Common RM fields +## Common RM fields The following fields are common to all Remapped Encodings: @@ -201,7 +525,7 @@ The following fields are common to all Remapped Encodings: |------------|------------|----------------------------------------| | MASKMODE | `0` | Execution (predication) Mask Kind | | MASK | `1:3` | Execution Mask | -| SUBVL | `8:9` | Sub-vector length | +| SUBVL | `8:9` | Sub-vector length | The following fields are optional or encoded differently depending on context after decoding of the Scalar suffix: @@ -210,46 +534,55 @@ on context after decoding of the Scalar suffix: |------------|------------|----------------------------------------| | ELWIDTH | `4:5` | Element Width | | ELWIDTH_SRC | `6:7` | Element Width for Source | -| EXTRA | `10:18` | Register Extra encoding | +| EXTRA | `10:18` | Register Extra encoding | | MODE | `19:23` | changes Vector behaviour | -* MODE changes the behaviour of the SV operation (result saturation, mapreduce) -* SUBVL groups elements together into vec2, vec3, vec4 for use in 3D and Audio/Video DSP work -* ELWIDTH and ELWIDTH_SRC overrides the instruction's destination and source operand width -* MASK (and MASK_SRC) and MASKMODE provide predication (two types of sources: scalar INT and Vector CR). -* Bits 10 to 18 (EXTRA) are further decoded depending on the RM category for the instruction, which is determined only by decoding the Scalar 32 bit suffix. - -Similar to Power ISA `X-Form` etc. EXTRA bits are given designations, such as `RM-1P-3S1D` which indicates for this example that the operation is to be single-predicated and that there are 3 source operand EXTRA tags and one destination operand tag. - -Note that if ELWIDTH != ELWIDTH_SRC this may result in reduced performance or increased latency in some implementations due to lane-crossing. - -# Mode - -Mode is an augmentation of SV behaviour. Different types of -instructions have different needs, similar to Power ISA -v3.1 64 bit prefix 8LS and MTRR formats apply to different -instruction types. Modes include Reduction, Iteration, arithmetic -saturation, and Fail-First. More specific details in each -section and in the [[svp64/appendix]] +* MODE changes the behaviour of the SV operation (result saturation, + mapreduce) +* SUBVL groups elements together into vec2, vec3, vec4 for use in 3D + and Audio/Video DSP work +* ELWIDTH and ELWIDTH_SRC overrides the instruction's destination and + source operand width +* MASK (and MASK_SRC) and MASKMODE provide predication (two types of + sources: scalar INT and Vector CR). +* Bits 10 to 18 (EXTRA) are further decoded depending on the RM category + for the instruction, which is determined only by decoding the Scalar 32 + bit suffix. + +Similar to Power ISA `X-Form` etc. EXTRA bits are given designations, +such as `RM-1P-3S1D` which indicates for this example that the operation +is to be single-predicated and that there are 3 source operand EXTRA +tags and one destination operand tag. + +Note that if ELWIDTH != ELWIDTH_SRC this may result in reduced performance +or increased latency in some implementations due to lane-crossing. + +## Mode + +Mode is an augmentation of SV behaviour. Different types of instructions +have different needs, similar to Power ISA v3.1 64 bit prefix 8LS and MTRR +formats apply to different instruction types. Modes include Reduction, +Iteration, arithmetic saturation, and Fail-First. More specific details +in each section and in the [[svp64/appendix]] * For condition register operations see [[sv/cr_ops]] * For LD/ST Modes, see [[sv/ldst]]. * For Branch modes, see [[sv/branches]] * For arithmetic and logical, see [[sv/normal]] -# ELWIDTH Encoding +## ELWIDTH Encoding -Default behaviour is set to 0b00 so that zeros follow the convention of -`scalar identity behaviour`. In this case it means that elwidth overrides -are not applicable. Thus if a 32 bit instruction operates on 32 bit, -`elwidth=0b00` specifies that this behaviour is unmodified. Likewise -when a processor is switched from 64 bit to 32 bit mode, `elwidth=0b00` -states that, again, the behaviour is not to be modified. +Default behaviour is set to 0b00 so that zeros follow the convention +of `scalar identity behaviour`. In this case it means that elwidth +overrides are not applicable. Thus if a 32 bit instruction operates +on 32 bit, `elwidth=0b00` specifies that this behaviour is unmodified. +Likewise when a processor is switched from 64 bit to 32 bit mode, +`elwidth=0b00` states that, again, the behaviour is not to be modified. Only when elwidth is nonzero is the element width overridden to the explicitly required value. -## Elwidth for Integers: +### Elwidth for Integers: | Value | Mnemonic | Description | |-------|----------------|------------------------------------| @@ -261,7 +594,7 @@ explicitly required value. This encoding is chosen such that the byte width may be computed as `8<<(3-ew)` -## Elwidth for FP Registers: +### Elwidth for FP Registers: | Value | Mnemonic | Description | |-------|----------------|------------------------------------| @@ -274,25 +607,26 @@ Note: [`bf16`](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format) is reserved for a future implementation of SV -Note that any IEEE754 FP operation in Power ISA ending in "s" (`fadds`) shall -perform its operation at **half** the ELWIDTH then padded back out -to ELWIDTH. `sv.fadds/ew=f32` shall perform an IEEE754 FP16 operation that is then "padded" to fill out to an IEEE754 FP32. When ELWIDTH=DEFAULT +Note that any IEEE754 FP operation in Power ISA ending in "s" (`fadds`) +shall perform its operation at **half** the ELWIDTH then padded back out +to ELWIDTH. `sv.fadds/ew=f32` shall perform an IEEE754 FP16 operation +that is then "padded" to fill out to an IEEE754 FP32. When ELWIDTH=DEFAULT clearly the behaviour of `sv.fadds` is performed at 32-bit accuracy then padded back out to fit in IEEE754 FP64, exactly as for Scalar -v3.0B "single" FP. Any FP operation ending in "s" where ELWIDTH=f16 -or ELWIDTH=bf16 is reserved and must raise an illegal instruction -(IEEE754 FP8 or BF8 are not defined). +v3.0B "single" FP. Any FP operation ending in "s" where ELWIDTH=f16 or +ELWIDTH=bf16 is reserved and must raise an illegal instruction (IEEE754 +FP8 or BF8 are not defined). -## Elwidth for CRs: +### Elwidth for CRs (no meaning) Element-width overrides for CR Fields has no meaning. The bits are therefore used for other purposes, or when Rc=1, the Elwidth applies to the result being tested (a GPR or FPR), but not to the Vector of CR Fields. -# SUBVL Encoding +## SUBVL Encoding -the default for SUBVL is 1 and its encoding is 0b00 to indicate that +The default for SUBVL is 1 and its encoding is 0b00 to indicate that SUBVL is effectively disabled (a SUBVL for-loop of only one element). this lines up in combination with all other "default is all zeros" behaviour. @@ -307,19 +641,15 @@ The SUBVL encoding value may be thought of as an inclusive range of a sub-vector. SUBVL=2 represents a vec2, its encoding is 0b01, therefore this may be considered to be elements 0b00 to 0b01 inclusive. -# MASK/MASK_SRC & MASKMODE Encoding - -TODO: rename MASK_KIND to MASKMODE +## MASK/MASK_SRC & MASKMODE Encoding One bit (`MASKMODE`) indicates the mode: CR or Int predication. The two types may not be mixed. -Special note: to disable predication this field must -be set to zero in combination with Integer Predication also being set -to 0b000. this has the effect of enabling "all 1s" in the predicate -mask, which is equivalent to "not having any predication at all" -and consequently, in combination with all other default zeros, fully -disables SV (`scalar identity behaviour`). +Special note: to disable predication this field must be set to zero in +combination with Integer Predication also being set to 0b000. this has the +effect of enabling "all 1s" in the predicate mask, which is equivalent to +"not having any predication at all". `MASKMODE` may be set to one of 2 values: @@ -329,32 +659,33 @@ disables SV (`scalar identity behaviour`). | 1 | MASK/MASK_SRC are encoded using CR-based Predication | Integer Twin predication has a second set of 3 bits that uses the same -encoding thus allowing either the same register (r3, r10 or r31) to be used -for both src and dest, or different regs (one for src, one for dest). +encoding thus allowing either the same register (r3, r10 or r31) to be +used for both src and dest, or different regs (one for src, one for dest). Likewise CR based twin predication has a second set of 3 bits, allowing a different test to be applied. -Note that it is assumed that Predicate Masks (whether INT or CR) -are read *before* the operations proceed. In practice (for CR Fields) -this creates an unnecessary block on parallelism. Therefore, -it is up to the programmer to ensure that the CR fields used as -Predicate Masks are not being written to by any parallel Vector Loop. -Doing so results in **UNDEFINED** behaviour, according to the definition -outlined in the Power ISA v3.0B Specification. +Note that it is assumed that Predicate Masks (whether INT or CR) are +read *before* the operations proceed. In practice (for CR Fields) +this creates an unnecessary block on parallelism. Therefore, it is up +to the programmer to ensure that the CR fields used as Predicate Masks +are not being written to by any parallel Vector Loop. Doing so results +in **UNDEFINED** behaviour, according to the definition outlined in the +Power ISA v3.0B Specification. Hardware Implementations are therefore free and clear to delay reading of individual CR fields until the actual predicated element operation -needs to take place, safe in the knowledge that no programmer will -have issued a Vector Instruction where previous elements could have -overwritten (destroyed) not-yet-executed CR-Predicated element operations. +needs to take place, safe in the knowledge that no programmer will have +issued a Vector Instruction where previous elements could have overwritten +(destroyed) not-yet-executed CR-Predicated element operations. -## Integer Predication (MASKMODE=0) +### Integer Predication (MASKMODE=0) When the predicate mode bit is zero the 3 bits are interpreted as below. Twin predication has an identical 3 bit field similarly encoded. -`MASK` and `MASK_SRC` may be set to one of 8 values, to provide the following meaning: +`MASK` and `MASK_SRC` may be set to one of 8 values, to provide the +following meaning: | Value | Mnemonic | Element `i` enabled if: | |-------|----------|------------------------------| @@ -367,14 +698,16 @@ Twin predication has an identical 3 bit field similarly encoded. | 110 | R30 | `R30 & (1 << i)` is non-zero | | 111 | ~R30 | `R30 & (1 << i)` is zero | -r10 and r30 are at the high end of temporary and unused registers, so as not to interfere with register allocation from ABIs. +r10 and r30 are at the high end of temporary and unused registers, +so as not to interfere with register allocation from ABIs. -## CR-based Predication (MASKMODE=1) +### CR-based Predication (MASKMODE=1) When the predicate mode bit is one the 3 bits are interpreted as below. Twin predication has an identical 3 bit field similarly encoded. -`MASK` and `MASK_SRC` may be set to one of 8 values, to provide the following meaning: +`MASK` and `MASK_SRC` may be set to one of 8 values, to provide the +following meaning: | Value | Mnemonic | Element `i` is enabled if | |-------|----------|--------------------------| @@ -387,42 +720,58 @@ Twin predication has an identical 3 bit field similarly encoded. | 110 | so/un | `CR[offs+i].FU` is set | | 111 | ns/nu | `CR[offs+i].FU` is clear | -CR based predication. TODO: select alternate CR for twin predication? see -[[discussion]] Overlap of the two CR based predicates must be taken -into account, so the starting point for one of them must be suitably -high, or accept that for twin predication VL must not exceed the range -where overlap will occur, *or* that they use the same starting point -but select different *bits* of the same CRs +`offs` is defined as CR32 (4x8) so as to mesh cleanly with Vectorised +Rc=1 operations (see below). Rc=1 operations start from CR8 (TBD). -`offs` is defined as CR32 (4x8) so as to mesh cleanly with Vectorised Rc=1 operations (see below). Rc=1 operations start from CR8 (TBD). +The CR Predicates chosen must start on a boundary that Vectorised CR +operations can access cleanly, in full. With EXTRA2 restricting starting +points to multiples of 8 (CR0, CR8, CR16...) both Vectorised Rc=1 and +CR Predicate Masks have to be adapted to fit on these boundaries as well. -The CR Predicates chosen must start on a boundary that Vectorised -CR operations can access cleanly, in full. -With EXTRA2 restricting starting points -to multiples of 8 (CR0, CR8, CR16...) both Vectorised Rc=1 and CR Predicate -Masks have to be adapted to fit on these boundaries as well. +## Extra Remapped Encoding -# Extra Remapped Encoding +Shows all instruction-specific fields in the Remapped Encoding +`RM[10:18]` for all instruction variants. Note that due to the very +tight space, the encoding mode is *not* included in the prefix itself. +The mode is "applied", similar to Power ISA "Forms" (X-Form, D-Form) +on a per-instruction basis, and, like "Forms" are given a designation +(below) of the form `RM-nP-nSnD`. The full list of which instructions +use which remaps is here [[opcode_regs_deduped]]. -Shows all instruction-specific fields in the Remapped Encoding `RM[10:18]` for all instruction variants. Note that due to the very tight space, the encoding mode is *not* included in the prefix itself. The mode is "applied", similar to Power ISA "Forms" (X-Form, D-Form) on a per-instruction basis, and, like "Forms" are given a designation (below) of the form `RM-nP-nSnD`. The full list of which instructions use which remaps is here [[opcode_regs_deduped]]. (*Machine-readable CSV files have been provided which will make the task of creating SV-aware ISA decoders easier*). +**Please note the following**: -These mappings are part of the SVP64 Specification in exactly the same +``` + Machine-readable CSV files have been autogenerated which will make the + task of creating SV-aware ISA decoders, documentation, assembler tools + compiler tools Simulators documentation all aspects of SVP64 easier + and less prone to mistakes. Please avoid manual re-creation of + information from the written specification wording in this chapter, + and use the CSV files or use the Canonical tool which creates the CSV + files, named sv_analysis.py. The information contained within + sv_analysis.py is considered to be part of this Specification, even + encoded as it is in python3. +``` + + +The mappings are part of the SVP64 Specification in exactly the same way as X-Form, D-Form. New Scalar instructions added to the Power ISA will need a corresponding SVP64 Mapping, which can be derived by-rote from examining the Register "Profile" of the instruction. -There are two categories: Single and Twin Predication. -Due to space considerations further subdivision of Single Predication -is based on whether the number of src operands is 2 or 3. With only -9 bits available some compromises have to be made. +There are two categories: Single and Twin Predication. Due to space +considerations further subdivision of Single Predication is based on +whether the number of src operands is 2 or 3. With only 9 bits available +some compromises have to be made. -* `RM-1P-3S1D` Single Predication dest/src1/2/3, applies to 4-operand instructions (fmadd, isel, madd). -* `RM-1P-2S1D` Single Predication dest/src1/2 applies to 3-operand instructions (src1 src2 dest) +* `RM-1P-3S1D` Single Predication dest/src1/2/3, applies to 4-operand + instructions (fmadd, isel, madd). +* `RM-1P-2S1D` Single Predication dest/src1/2 applies to 3-operand + instructions (src1 src2 dest) * `RM-2P-1S1D` Twin Predication (src=1, dest=1) * `RM-2P-2S1D` Twin Predication (src=2, dest=1) primarily for LDST (Indexed) * `RM-2P-1S2D` Twin Predication (src=1, dest=2) primarily for LDST Update -## RM-1P-3S1D +### RM-1P-3S1D | Field Name | Field bits | Description | |------------|------------|----------------------------------------| @@ -437,7 +786,7 @@ These are for 3 operand in and either 1 or 2 out instructions. such as `maddedu` have an implicit second destination, RS, the selection of which is determined by bit 18. -## RM-1P-2S1D +### RM-1P-2S1D | Field Name | Field bits | Description | |------------|------------|-------------------------------------------| @@ -446,13 +795,14 @@ selection of which is determined by bit 18. | Rsrc2\_EXTRA3 | `16:18` | extends Rsrc3 | These are for 2 operand 1 dest instructions, such as `add RT, RA, -RB`. However also included are unusual instructions with an implicit dest -that is identical to its src reg, such as `rlwinmi`. +RB`. However also included are unusual instructions with an implicit +dest that is identical to its src reg, such as `rlwinmi`. -Normally, with instructions such as `rlwinmi`, the scalar v3.0B ISA would not have sufficient bit fields to allow -an alternative destination. With SV however this becomes possible. -Therefore, the fact that the dest is implicitly also a src should not -mislead: due to the *prefix* they are different SV regs. +Normally, with instructions such as `rlwinmi`, the scalar v3.0B ISA would +not have sufficient bit fields to allow an alternative destination. +With SV however this becomes possible. Therefore, the fact that the +dest is implicitly also a src should not mislead: due to the *prefix* +they are different SV regs. * `rlwimi RA, RS, ...` * Rsrc1_EXTRA3 applies to RS as the first src @@ -463,7 +813,7 @@ With the addition of the EXTRA bits, the three registers each may be *independently* made vector or scalar, and be independently augmented to 7 bits in length. -## RM-2P-1S1D/2S +### RM-2P-1S1D/2S | Field Name | Field bits | Description | |------------|------------|----------------------------| @@ -473,22 +823,22 @@ augmented to 7 bits in length. `RM-2P-2S` is for `stw` etc. and is Rsrc1 Rsrc2. -## RM-1P-2S1D +### RM-1P-2S1D single-predicate, three registers (2 read, 1 write) - + | Field Name | Field bits | Description | |------------|------------|----------------------------| | Rdest_EXTRA3 | `10:12` | extends Rdest | | Rsrc1_EXTRA3 | `13:15` | extends Rsrc1 | | Rsrc2_EXTRA3 | `16:18` | extends Rsrc2 | -## RM-2P-2S1D/1S2D/3S +### RM-2P-2S1D/1S2D/3S The primary purpose for this encoding is for Twin Predication on LOAD and STORE operations. see [[sv/ldst]] for detailed anslysis. -RM-2P-2S1D: +**RM-2P-2S1D:** | Field Name | Field bits | Description | |------------|------------|----------------------------| @@ -497,19 +847,39 @@ RM-2P-2S1D: | Rsrc2_EXTRA2 | `14:15` | extends Rsrc2 (R\*\_EXTRA2 Encoding) | | MASK_SRC | `16:18` | Execution Mask for Source | -Note that for 1S2P the EXTRA2 dest and src names are switched (Rsrc_EXTRA2 +**RM-2P-1S2D:** + +For RM-2P-1S2D the EXTRA2 dest and src names are switched (Rsrc_EXTRA2 is in bits 10:11, Rdest1_EXTRA2 in 12:13) -Also that for 3S (to cover `stdx` etc.) the names are switched to 3 src: Rsrc1_EXTRA2, Rsrc2_EXTRA2, Rsrc3_EXTRA2. +| Field Name | Field bits | Description | +|------------|------------|----------------------------| +| Rsrc2_EXTRA2 | `10:11` | extends Rsrc2 (R\*\_EXTRA2 Encoding) | +| Rsrc1_EXTRA2 | `12:13` | extends Rsrc1 (R\*\_EXTRA2 Encoding) | +| Rdest_EXTRA2 | `14:15` | extends Rdest (R\*\_EXTRA2 Encoding) | +| MASK_SRC | `16:18` | Execution Mask for Source | -Note also that LD with update indexed, which takes 2 src and 2 dest -(e.g. `lhaux RT,RA,RB`), does not have room for 4 registers and also -Twin Predication. therefore these are treated as RM-2P-2S1D and the -src spec for RA is also used for the same RA as a dest. +**RM-2P-3S:** -Note that if ELWIDTH != ELWIDTH_SRC this may result in reduced performance or increased latency in some implementations due to lane-crossing. +Also that for RM-2P-3S (to cover `stdx` etc.) the names are switched to 3 src: +Rsrc1_EXTRA2, Rsrc2_EXTRA2, Rsrc3_EXTRA2. -# R\*\_EXTRA2/3 +| Field Name | Field bits | Description | +|------------|------------|----------------------------| +| Rsrc1_EXTRA2 | `10:11` | extends Rsrc1 (R\*\_EXTRA2 Encoding) | +| Rsrc2_EXTRA2 | `12:13` | extends Rsrc2 (R\*\_EXTRA2 Encoding) | +| Rsrc3_EXTRA2 | `14:15` | extends Rsrc3 (R\*\_EXTRA2 Encoding) | +| MASK_SRC | `16:18` | Execution Mask for Source | + +Note also that LD with update indexed, which takes 2 src and +creates 2 dest registers (e.g. `lhaux RT,RA,RB`), does not have room +for 4 registers and also Twin Predication. Therefore these are treated as +RM-2P-2S1D and the src spec for RA is also used for the same RA as a dest. + +Note that if ELWIDTH != ELWIDTH_SRC this may result in reduced performance +or increased latency in some implementations due to lane-crossing. + +## R\*\_EXTRA2/3 EXTRA is the means by which two things are achieved: @@ -519,8 +889,8 @@ EXTRA is the means by which two things are achieved: The register files are therefore extended: -* INT is extended from r0-31 to r0-127 -* FP is extended from fp0-32 to fp0-fp127 +* INT (GPR) is extended from r0-31 to r0-127 +* FP (FPR) is extended from fp0-32 to fp0-fp127 * CR Fields are extended from CR0-7 to CR0-127 However due to pressure in `RM.EXTRA` not all these registers @@ -528,15 +898,16 @@ are accessible by all instructions, particularly those with a large number of operands (`madd`, `isel`). In the following tables register numbers are constructed from the -standard v3.0B / v3.1B 32 bit register field (RA, FRA) and the EXTRA2 -or EXTRA3 field from the SV Prefix, determined by the specific -RM-xx-yyyy designation for a given instruction. -The prefixing is arranged so that +standard v3.0B / v3.1B 32 bit register field (RA, FRA) and the EXTRA2 or +EXTRA3 field from the SV Prefix, determined by the specific RM-xx-yyyy +designation for a given instruction. The prefixing is arranged so that interoperability between prefixing and nonprefixing of scalar registers is direct and convenient (when the EXTRA field is all zeros). -A pseudocode algorithm explains the relationship, for INT/FP (see [[svp64/appendix]] for CRs) +A pseudocode algorithm explains the relationship, for INT/FP (see +[[svp64/appendix]] for CRs) +``` if extra3_mode: spec = EXTRA3 else: @@ -545,17 +916,18 @@ A pseudocode algorithm explains the relationship, for INT/FP (see [[svp64/append return (RA << 2) | spec[1:2] else: # scalar return (spec[1:2] << 5) | RA +``` Future versions may extend to 256 by shifting Vector numbering up. Scalar will not be altered. Note that in some cases the range of starting points for Vectors -is limited. +is limited. -## INT/FP EXTRA3 +### INT/FP EXTRA3 -If EXTRA3 is zero, maps to -"scalar identity" (scalar Power ISA field naming). +If EXTRA3 is zero, maps to "scalar identity" (scalar Power ISA field +naming). Fields are as follows: @@ -568,7 +940,9 @@ Fields are as follows: * MSB..LSB: the bit field showing how the register opcode field combines with EXTRA to give (extend) the register number (GPR) -| Value | Mode | Range/Inc | 6..0 | +Encoding shown in LSB0: MSB down to LSB (MSB 6..0 LSB) + +| Value | Mode | Range/Inc | 6..0 | |-----------|-------|---------------|---------------------| | 000 | Scalar | `r0-r31`/1 | `0b00 RA` | | 001 | Scalar | `r32-r63`/1 | `0b01 RA` | @@ -579,13 +953,15 @@ Fields are as follows: | 110 | Vector | `r2-r126`/4 | `RA 0b10` | | 111 | Vector | `r3-r127`/4 | `RA 0b11` | -## INT/FP EXTRA2 +### INT/FP EXTRA2 -If EXTRA2 is zero will map to -"scalar identity behaviour" i.e Scalar Power ISA register naming: +If EXTRA2 is zero will map to "scalar identity behaviour" i.e Scalar +Power ISA register naming: -| Value | Mode | Range/inc | 6..0 | -|-----------|-------|---------------|-----------| +Encoding shown in LSB0: MSB down to LSB (MSB 6..0 LSB) + +| Value | Mode | Range/inc | 6..0 | +|----------|-------|---------------|-----------| | 00 | Scalar | `r0-r31`/1 | `0b00 RA` | | 01 | Scalar | `r32-r63`/1 | `0b01 RA` | | 10 | Vector | `r0-r124`/4 | `RA 0b00` | @@ -599,26 +975,29 @@ If EXTRA2 is zero will map to as there is insufficient bits to cover the full range. -## CR Field EXTRA3 +### CR Field EXTRA3 -CR Field encoding is essentially the same but made more complex due to CRs being bit-based. See [[svp64/appendix]] for explanation and pseudocode. +CR Field encoding is essentially the same but made more complex due to CRs +being bit-based, because the application of SVP64 element-numbering applies +to the CR *Field* numbering not the CR register *bit* numbering. Note that Vectors may only start from `CR0, CR4, CR8, CR12, CR16, CR20`... and Scalars may only go from `CR0, CR1, ... CR31` -Encoding shown MSB down to LSB +Encoding shown in LSB0: MSB down to LSB (MSB 8..5 4..2 1..0 LSB), +BA ranges are in MSB0. For a 5-bit operand (BA, BB, BT): | Value | Mode | Range/Inc | 8..5 | 4..2 | 1..0 | |-------|------|---------------|-----------| --------|---------| -| 000 | Scalar | `CR0-CR7`/1 | 0b0000 | BA[4:2] | BA[1:0] | -| 001 | Scalar | `CR8-CR15`/1 | 0b0001 | BA[4:2] | BA[1:0] | -| 010 | Scalar | `CR16-CR23`/1 | 0b0010 | BA[4:2] | BA[1:0] | -| 011 | Scalar | `CR24-CR31`/1 | 0b0011 | BA[4:2] | BA[1:0] | -| 100 | Vector | `CR0-CR112`/16 | BA[4:2] 0 | 0b000 | BA[1:0] | -| 101 | Vector | `CR4-CR116`/16 | BA[4:2] 0 | 0b100 | BA[1:0] | -| 110 | Vector | `CR8-CR120`/16 | BA[4:2] 1 | 0b000 | BA[1:0] | -| 111 | Vector | `CR12-CR124`/16 | BA[4:2] 1 | 0b100 | BA[1:0] | +| 000 | Scalar | `CR0-CR7`/1 | 0b0000 | BA[0:2] | BA[3:4] | +| 001 | Scalar | `CR8-CR15`/1 | 0b0001 | BA[0:2] | BA[3:4] | +| 010 | Scalar | `CR16-CR23`/1 | 0b0010 | BA[0:2] | BA[3:4] | +| 011 | Scalar | `CR24-CR31`/1 | 0b0011 | BA[0:2] | BA[3:4] | +| 100 | Vector | `CR0-CR112`/16 | BA[0:2] 0 | 0b000 | BA[3:4] | +| 101 | Vector | `CR4-CR116`/16 | BA[0:2] 0 | 0b100 | BA[3:4] | +| 110 | Vector | `CR8-CR120`/16 | BA[0:2] 1 | 0b000 | BA[3:4] | +| 111 | Vector | `CR12-CR124`/16 | BA[0:2] 1 | 0b100 | BA[3:4] | For a 3-bit operand (e.g. BFA): @@ -633,22 +1012,24 @@ For a 3-bit operand (e.g. BFA): | 110 | Vector | `CR8-CR120`/16 | BFA 1 | 0b000 | | 111 | Vector | `CR12-CR124`/16 | BFA 1 | 0b100 | -## CR EXTRA2 +### CR EXTRA2 -CR encoding is essentially the same but made more complex due to CRs being bit-based. See separate section for explanation and pseudocode. +CR encoding is essentially the same but made more complex due to CRs +being bit-based, because the application of SVP64 element-numbering applies +to the CR *Field* numbering not the CR register *bit* numbering. Note that Vectors may only start from CR0, CR8, CR16, CR24, CR32... - -Encoding shown MSB down to LSB +Encoding shown in LSB0: MSB down to LSB (MSB 8..5 4..2 1..0 LSB), +BA ranges are in MSB0. For a 5-bit operand (BA, BB, BC): | Value | Mode | Range/Inc | 8..5 | 4..2 | 1..0 | |-------|--------|----------------|---------|---------|---------| -| 00 | Scalar | `CR0-CR7`/1 | 0b0000 | BA[4:2] | BA[1:0] | -| 01 | Scalar | `CR8-CR15`/1 | 0b0001 | BA[4:2] | BA[1:0] | -| 10 | Vector | `CR0-CR112`/16 | BA[4:2] 0 | 0b000 | BA[1:0] | -| 11 | Vector | `CR8-CR120`/16 | BA[4:2] 1 | 0b000 | BA[1:0] | +| 00 | Scalar | `CR0-CR7`/1 | 0b0000 | BA[0:2] | BA[3:4] | +| 01 | Scalar | `CR8-CR15`/1 | 0b0001 | BA[0:2] | BA[3:4] | +| 10 | Vector | `CR0-CR112`/16 | BA[0:2] 0 | 0b000 | BA[3:4] | +| 11 | Vector | `CR8-CR120`/16 | BA[0:2] 1 | 0b000 | BA[3:4] | For a 3-bit operand (e.g. BFA): @@ -659,7 +1040,13 @@ For a 3-bit operand (e.g. BFA): | 10 | Vector | `CR0-CR112`/16 | BFA 0 | 0b000 | | 11 | Vector | `CR8-CR120`/16 | BFA 1 | 0b000 | -# Appendix +## Appendix Now at its own page: [[svp64/appendix]] +-------- + +\newpage{} + +[[!tag standards]] + -- 2.30.2