From 3b9153d6efc5e898061801cb92b3dee75a607891 Mon Sep 17 00:00:00 2001 From: Luke Kenneth Casson Leighton Date: Sat, 19 Dec 2020 16:35:33 +0000 Subject: [PATCH] whitespace --- openpower/sv/svp_rewrite/svp64.mdwn | 247 +++++++++++++++++++--------- 1 file changed, 173 insertions(+), 74 deletions(-) diff --git a/openpower/sv/svp_rewrite/svp64.mdwn b/openpower/sv/svp_rewrite/svp64.mdwn index ea856df57..b3048cce6 100644 --- a/openpower/sv/svp_rewrite/svp64.mdwn +++ b/openpower/sv/svp_rewrite/svp64.mdwn @@ -1,34 +1,44 @@ # Rewrite of SVP64 for OpenPower ISA v3.1 * [[svp64/discussion]] +* -The plan is to create an encoding for SVP64, then to create an encoding for -SVP48, then to reorganize them both to improve field overlap, reducing the -amount of decoder hardware necessary. +The plan is to create an encoding for SVP64, then to create an encoding +for SVP48, then to reorganize them both to improve field overlap, +reducing the amount of decoder hardware necessary. -All bit numbers are in MSB0 form (the bits are numbered from 0 at the MSB and -counting up as you move to the LSB end). All bit ranges are inclusive (so -`4:6` means bits 4, 5, and 6). +All bit numbers are in MSB0 form (the bits are numbered from 0 at the MSB +and counting up as you move to the LSB end). All bit ranges are inclusive +(so `4:6` means bits 4, 5, and 6). -64-bit instructions are split into two 32-bit words, the prefix and the suffix. The prefix always comes before the suffix in PC order. +64-bit instructions are split into two 32-bit words, the prefix and the +suffix. The prefix always comes before the suffix in PC order. -SVP64 is designed so that when the prefix is all zeros, no effect or influence occurs (no augmentation) such that all standard OpenPOWER v3.B instructions may be active at that time, in full (and SV is quiescent). The corollary is that when the SV prefix is nonzero, alternative meanings may be given to all and any instructions. +SVP64 is designed so that when the prefix is all zeros, no effect or +influence occurs (no augmentation) such that all standard OpenPOWER +v3.B instructions may be active at that time, in full (and SV is +quiescent). The corollary is that when the SV prefix is nonzero, +alternative meanings may be given to all and any instructions. # Definition of Reserved in this spec. -For the new fields added in SVP64, instructions that have any of their fields set to a reserved value must cause an illegal instruction trap, to allow emulation of future instruction sets. +For the new fields added in SVP64, instructions that have any of their +fields set to a reserved value must cause an illegal instruction trap, +to allow emulation of future instruction sets. This is unlike OpenPower ISA v3.1, which doesn't require a CPU to trap. # Remapped Encoding (`RM[0:23]`) -To allow relatively easy remapping of which portions of the Prefix Opcode Map -are used for SVP64 without needing to rewrite a large portion of the SVP64 -spec, a mapping is defined from the OpenPower v3.1 prefix bits to a new 24-bit -Remapped Encoding denoted `RM[0]` at the MSB to `RM[23]` at the LSB. +To allow relatively easy remapping of which portions of the Prefix Opcode +Map are used for SVP64 without needing to rewrite a large portion of the +SVP64 spec, a mapping is defined from the OpenPower v3.1 prefix bits to +a new 24-bit Remapped Encoding denoted `RM[0]` at the MSB to `RM[23]` +at the LSB. + +The mapping from the OpenPower v3.1 prefix bits to the Remapped Encoding +is defined in the Prefix Fields section. -The mapping from the OpenPower v3.1 prefix bits to the Remapped Encoding is -defined in the Prefix Fields section. ## Prefix Opcode Map (64-bit instruction encoding) (prefix bits 6:11) (shows both PowerISA v3.1 instructions as well as new SVP instructions; empty spaces are yet-to-be-allocated Illegal Instructions) @@ -58,7 +68,10 @@ defined in the Prefix Fields section. # Remapped Encoding Fields -Shows all fields in the Remapped Encoding `RM[0:23]` for all instruction variants. There are two categories: Single and Twin Predication. Due to space considerations further subdivision of Single Predication is based on whether the number of src operands is 2 or 3. +Shows all fields in the Remapped Encoding `RM[0:23]` for all instruction +variants. There are two categories: Single and Twin Predication. +Due to space considerations further subdivision of Single Predication +is based on whether the number of src operands is 2 or 3. * `RM-1P-3S1D` Single Predication dest/src1/2/3, applies to 4-operand instructions (fmadd, isel, madd). @@ -71,14 +84,14 @@ Shows all fields in the Remapped Encoding `RM[0:23]` for all instruction variant | Field Name | Field bits | Description | |------------|------------|------------------------------------------------| -| MASK_KIND | `0` | Execution Mask Kind | +| MASK\_KIND | `0` | Execution Mask Kind | | MASK | `1:3` | Execution Mask | | ELWIDTH | `4:5` | Element Width | | SUBVL | `6:7` | Sub-vector length | -| Rdest_EXTRA2 | `8:9` | extra bits for Rdest (R\*_EXTRA2 Encoding) | -| Rsrc1_EXTRA2 | `10:11` | extra bits for Rsrc1 (R\*_EXTRA2 Encoding) | -| Rsrc2_EXTRA2 | `12:13` | extra bits for Rsrc2 (R\*_EXTRA2 Encoding) | -| Rsrc3_EXTRA2 | `14:15` | extra bits for Rsrc3 (R\*_EXTRA2 Encoding| +| Rdest\_EXTRA2 | `8:9` | extra bits for Rdest (R\*\_EXTRA2 Encoding) | +| Rsrc1\_EXTRA2 | `10:11` | extra bits for Rsrc1 (R\*\_EXTRA2 Encoding) | +| Rsrc2\_EXTRA2 | `12:13` | extra bits for Rsrc2 (R\*\_EXTRA2 Encoding) | +| Rsrc3\_EXTRA2 | `14:15` | extra bits for Rsrc3 (R\*\_EXTRA2 Encoding| | reserved | `16` | reserved | | MODE | `19:23` | see [[discussion]] | @@ -87,25 +100,32 @@ Shows all fields in the Remapped Encoding `RM[0:23]` for all instruction variant | Field Name | Field bits | Description | |------------|------------|------------------------------------------------| -| MASK_KIND | `0` | Execution Mask Kind | +| MASK\_KIND | `0` | Execution Mask Kind | | MASK | `1:3` | Execution Mask | | ELWIDTH | `4:5` | Element Width | | SUBVL | `6:7` | Sub-vector length | -| Rdest_EXTRA3 | `8:10` | extra bits for Rdest (Uses R\*_EXTRA3 Encoding) | -| Rsrc1_EXTRA3 | `11:13` | extra bits for Rsrc1 (Uses R\*_EXTRA3 Encoding) | -| Rsrc2_EXTRA3 | `14:16` | extra bits for Rsrc3 (Uses R\*_EXTRA3 Encoding) | +| Rdest\_EXTRA3 | `8:10` | extra bits for Rdest (Uses R\*\_EXTRA3 Encoding) | +| Rsrc1\_EXTRA3 | `11:13` | extra bits for Rsrc1 (Uses R\*\_EXTRA3 Encoding) | +| Rsrc2\_EXTRA3 | `14:16` | extra bits for Rsrc3 (Uses R\*\_EXTRA3 Encoding) | | MODE | `19:23` | see [[discussion]] | -These are for 2 operand 1 dest instructions, such as `add RT, RA, RB`. However also included are unusual instructions with the same src and dest, such as `rlwinmi`. +These are for 2 operand 1 dest instructions, such as `add RT, RA, +RB`. However also included are unusual instructions with the same src +and dest, such as `rlwinmi`. -Normally, the scalar v3.0B ISA would not have sufficient bits to allow an alternative destination. With SV however this becomes possible. Therefore, the fact that the dest is implicitly also a src should not mislead: rhey are different SV regs. +Normally, the scalar v3.0B ISA would not have sufficient bits to allow +an alternative destination. With SV however this becomes possible. +Therefore, the fact that the dest is implicitly also a src should not +mislead: rhey are different SV regs. * `rlwimi RA, RS, ...` * Rsrc1_EXTRA3 applies to RS as the first src * Rsrc2_EXTRA3 applies to RA as the secomd src * Rdest_EXTRA3 applies to RA to create an **independent** dest. -Otherwise the normal SV hardware for-loop applies. The three registers each may be independently made vector or scalar, and may independently augmented to 7 bits in length. +Otherwise the normal SV hardware for-loop applies. The three registers +each may be independently made vector or scalar, and may independently +augmented to 7 bits in length. ## RM-2P-1S1D @@ -121,11 +141,14 @@ Otherwise the normal SV hardware for-loop applies. The three registers each may | ELWIDTH_SRC | `17:18` | Element Width for Source | | MODE | `19:23` | see [[discussion]] | -note in [[discussion]]: TODO, evaluate if 2nd SUBVL should be added. conclusion: no. 2nd SUBVL makes no sense except for mv, and that is covered by [[mv.vec]] +note in [[discussion]]: TODO, evaluate if 2nd SUBVL should be added. +conclusion: no. 2nd SUBVL makes no sense except for mv, and that is +covered by [[mv.vec]] ## RM-2P-2S1D/1S2D -The primary purpose for this encoding is for Twin Predication on LOAD and STORE operations. see [[sv/ldst]] for detailed anslysis. +The primary purpose for this encoding is for Twin Predication on LOAD +and STORE operations. see [[sv/ldst]] for detailed anslysis. RM-2P-2S1D: @@ -135,26 +158,38 @@ RM-2P-2S1D: | MASK | `1:3` | Execution Mask | | ELWIDTH | `4:5` | Element Width | | SUBVL | `6:7` | Sub-vector length | -| Rdest_EXTRA2 | `8:9` | extra bits for Rdest (R\*_EXTRA2 Encoding) | -| Rsrc1_EXTRA2 | `10:11` | extra bits for Rsrc1 (R\*_EXTRA2 Encoding) | -| Rsrc2_EXTRA2 | `12:13` | extra bits for Rsrc2 (R\*_EXTRA2 Encoding) | +| Rdest_EXTRA2 | `8:9` | extra bits for Rdest (R\*\_EXTRA2 Encoding) | +| Rsrc1_EXTRA2 | `10:11` | extra bits for Rsrc1 (R\*\_EXTRA2 Encoding) | +| Rsrc2_EXTRA2 | `12:13` | extra bits for Rsrc2 (R\*\_EXTRA2 Encoding) | | MASK_SRC | `14:16` | Execution Mask for Source | | ELWIDTH_SRC | `17:18` | Element Width for Source | | MODE | `19:23` | see [[discussion]] | -Note that for 1S2P the EXTRA2 dest and src names are switched (Rsrc_EXTRA2 is in bits 8:9, Rdest1_EXTRA2 in 10:11) +Note that for 1S2P the EXTRA2 dest and src names are switched (Rsrc_EXTRA2 +is in bits 8:9, Rdest1_EXTRA2 in 10:11) -Note also that LD with update indexed, which takes 2 src and 2 dest (e.g. `lhaux RT,RA,RB`), does not have room for 4 registers and also Twin Predication. therefore these are treated as RM-2P-2S1D and the src spec for RA is also used for the same RA as a dest. +Note also that LD with update indexed, which takes 2 src and 2 dest +(e.g. `lhaux RT,RA,RB`), does not have room for 4 registers and also +Twin Predication. therefore these are treated as RM-2P-2S1D and the +src spec for RA is also used for the same RA as a dest. ## R\*_EXTRA2 and R\*_EXTRA3 Encoding -In the following tables register numbers are constructed from the standard v3.0B / v3.1B 32 bit register field (RA, FRA) and the EXTRA2 or EXTRA3 field from the SV Prefix. The prefixing is arranged so that interoperability between prefixing and nonprefixing of scalar registers is direct and convenient (when the EXTRA field is all zeros). +In the following tables register numbers are constructed from the +standard v3.0B / v3.1B 32 bit register field (RA, FRA) and the EXTRA2 +or EXTRA3 field from the SV Prefix. The prefixing is arranged so that +interoperability between prefixing and nonprefixing of scalar registers +is direct and convenient (when the EXTRA field is all zeros). 3 bit version -alternative which is understandable and, if EXTRA3 is zero, maps to "no effect" (scalar OpenPOWER ISA field naming). also, these are the encodings used in the original SV Prefix scheme. the reason why they were chosen is so that scalar registers in v3.0B and prefixed scalar registers have access to the same 32 registers. +alternative which is understandable and, if EXTRA3 is zero, maps to +"no effect" (scalar OpenPOWER ISA field naming). also, these are the +encodings used in the original SV Prefix scheme. the reason why they +were chosen is so that scalar registers in v3.0B and prefixed scalar +registers have access to the same 32 registers. -| R\*_EXTRA3 | Mode | Range | Encoded as | +| R\*\_EXTRA3 | Mode | Range | Encoded as | |-----------|-------|---------------|---------------------| | 000 | Scalar | `r0-r31` | `0b00 RA` | | 001 | Scalar | `r32-r63` | `0b01 RA` | @@ -175,16 +210,18 @@ algorithm for original version: 2 bit version -alternative which is understandable and, if EXTRA2 is zero will map to "no effect" i.e Scalar OpenPOWER register naming: +alternative which is understandable and, if EXTRA2 is zero will map to +"no effect" i.e Scalar OpenPOWER register naming: -| R\*_EXTRA2 | Mode | Range | Encoded as | +| R\*\_EXTRA2 | Mode | Range | Encoded as | |-----------|-------|---------------|---------------------| | 00 | Scalar | `r0-r31` | `0b00 RA` | | 01 | Scalar | `r32-r63` | `0b01 RA` | | 10 | Vector | `r0-r124` | `RA 0b00` | | 11 | Vector | `r2-r126` | `RA 0b10` | -algorithm for original version is identical to the 3 bit version except that the spec is shifted up by one bit +algorithm for original version is identical to the 3 bit version except +that the spec is shifted up by one bit spec = EXTRA2 << 1 # same as EXTRA3, shifted if spec[2]: # vector @@ -194,9 +231,15 @@ algorithm for original version is identical to the 3 bit version except that the ## ELWIDTH Encoding -Default behaviour is set to 0b00 so that zeros follow the convention of "npt doing anything". In this case it means that elwidth overrides are not applicable. Thus if a 32 bit instruction operates on 32 bit, `elwidth=0b00` specifies that this behaviour is unmodified. Likewise when a processor is switched from 64 bit to 32 bit mode, `elwidth=0b00` states that, again, the behaviour is not to be modified. +Default behaviour is set to 0b00 so that zeros follow the convention of +"npt doing anything". In this case it means that elwidth overrides +are not applicable. Thus if a 32 bit instruction operates on 32 bit, +`elwidth=0b00` specifies that this behaviour is unmodified. Likewise +when a processor is switched from 64 bit to 32 bit mode, `elwidth=0b00` +states that, again, the behaviour is not to be modified. -Only when elwidth is nonzero is the element width overridden to the explicitly required value. +Only when elwidth is nonzero is the element width overridden to the +explicitly required value. ### Elwidth for Integers: @@ -216,46 +259,65 @@ Only when elwidth is nonzero is the element width overridden to the explicitly r | 10 | `ELWIDTH=f16` | 16-bit IEEE 754 Half floating-point | | 11 | `ELWIDTH=f32` | 32-bit IEEE 754 Single floating-point | -Note: [`bf16`](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format) +Note: +[`bf16`](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format) is reserved for a future implementation of SV ### Elwidth for CRs: -TODO, important, particularly for crops, mfcr and mtcr, what elwidth even means. instead it may be possible to use the bits as extra indices (EXTRA6) to access the full 64 CRs. TBD, several ideas +TODO, important, particularly for crops, mfcr and mtcr, what elwidth +even means. instead it may be possible to use the bits as extra indices +(EXTRA6) to access the full 64 CRs. TBD, several ideas -The actual width of the CRs cannot be altered: they are 4 bit. Thus, for Rc=1 operations that produce a result and corresponding CR, it is the result to which the elwidth override applies, not the CR. +The actual width of the CRs cannot be altered: they are 4 bit. Thus, +for Rc=1 operations that produce a result and corresponding CR, it is +the result to which the elwidth override applies, not the CR. -As mentioned TBD, this leaves crops etc. to have a meaning defined for elwidth, because these ops are pure explicit CR based. +As mentioned TBD, this leaves crops etc. to have a meaning defined for +elwidth, because these ops are pure explicit CR based. Examples: mfxm may take the extra bits and use them as extra mask bits. ## SUBVL Encoding -the default for SUBVL is 1 and its encoding is 0b00 to indicate that SUBVL is effectively disabled (a SUBVL for-loop of only one element). this lines up in combination with all other "default is all zeros" behaviour. +the default for SUBVL is 1 and its encoding is 0b00 to indicate that +SUBVL is effectively disabled (a SUBVL for-loop of only one element). this +lines up in combination with all other "default is all zeros" behaviour. -| Value | Mnemonic | Description | -|-------|---------------------|------------------------| -| 00 | `SUBVL=1` (default) | Sub-vector length of 1 | -| 01 | `SUBVL=2` | Sub-vector length of 2 | -| 10 | `SUBVL=3` | Sub-vector length of 3 | -| 11 | `SUBVL=4` | Sub-vector length of 4 | +| Value | Mnemonic | xxx | Description | +|-------|-----------|---------|------------------------| +| 00 | `SUBVL=1` | default | Sub-vector length of 1 | +| 01 | `SUBVL=2` | vec2 | Sub-vector length of 2 | +| 10 | `SUBVL=3` | vec3 | Sub-vector length of 3 | +| 11 | `SUBVL=4` | vec4 | Sub-vector length of 4 | -The SUBVL encoding value may be thought of as an inclusive range of a sub-vector. SUBVL=2 represents a vec2, its encoding is 0b01, therefore this may be considered to be elements 0b00 to 0b01 inclusive. +The SUBVL encoding value may be thought of as an inclusive range of a +sub-vector. SUBVL=2 represents a vec2, its encoding is 0b01, therefore +this may be considered to be elements 0b00 to 0b01 inclusive. ## MASK/MASK_SRC & MASK_KIND Encoding -One bit (`MASKMODE`) indicates the mode: CR or Int predication. The two types may not be mixed. +One bit (`MASKMODE`) indicates the mode: CR or Int predication. The two +types may not be mixed. -Special note: to get default behaviour (SV disabled) this field must be set to zero in combination with Integer Predication also being set to 0b000. this has the effect of enabling "all 1s" in the predicate mask, which is equivalent to "not having any predication at all" and consequently, in combination with all other default zeros, fully disables SV. +Special note: to get default behaviour (SV disabled) this field must +be set to zero in combination with Integer Predication also being set +to 0b000. this has the effect of enabling "all 1s" in the predicate +mask, which is equivalent to "not having any predication at all" +and consequently, in combination with all other default zeros, fully +disables SV. | Value | Description | |-------|------------------------------------------------------| | 0 | MASK/MASK_SRC are encoded using Integer Predication | | 1 | MASK/MASK_SRC are encoded using CR-based Predication | -Integer Twin predication has a second set of 3 bits that uses the same encoding thus allowing either the same register (r3 or r10) to be used for both src and dest, or different regs (one for src, one for dest). +Integer Twin predication has a second set of 3 bits that uses the same +encoding thus allowing either the same register (r3 or r10) to be used +for both src and dest, or different regs (one for src, one for dest). -Likewise CR based twin predication has a second set of 3 bits, allowing a different test to be applied. +Likewise CR based twin predication has a second set of 3 bits, allowing +a different test to be applied. ### Integer Predication (MASK_KIND=0) @@ -275,7 +337,8 @@ Twin predication has an identical 3 bit field similarly encoded. ### CR-based Predication (MASK_KIND=1) -When the predicate mode bit is one the 3 bits are interpreted as below. Twin predication has an identical 3 bit field similarly encoded +When the predicate mode bit is one the 3 bits are interpreted as below. +Twin predication has an identical 3 bit field similarly encoded | Value | Mnemonic | Description | |-------|----------|-------------------------------------------------| @@ -288,11 +351,19 @@ When the predicate mode bit is one the 3 bits are interpreted as below. Twin pr | 110 | so/un | Element `i` is enabled if `CR[6+i].FU` is set | | 111 | ns/nu | Element `i` is enabled if `CR[6+i].FU` is clear | -CR based predication. TODO: select alternate CR for twin predication? see [[discussion]] Overlap of the two CR based predicates must be taken into account, so the starting point for one of them must be suitably high, or accept that for twin predication VL must not exceed the range where overlap will occur, *or* that they use the same starting point but select different *bits* of the same CRs +CR based predication. TODO: select alternate CR for twin predication? see +[[discussion]] Overlap of the two CR based predicates must be taken +into account, so the starting point for one of them must be suitably +high, or accept that for twin predication VL must not exceed the range +where overlap will occur, *or* that they use the same starting point +but select different *bits* of the same CRs # Twin Predication -This is a novel concept that allows predication to be applied to a single source and a single dest register. The following types of traditional Vector operations may be encoded with it, *without requiring explicit opcodes to do so* +This is a novel concept that allows predication to be applied to a single +source and a single dest register. The following types of traditional +Vector operations may be encoded with it, *without requiring explicit +opcodes to do so* * VSPLAT (a single scalar distributed across a vector) * VEXTRACT (like LLVM IR [`extractelement`](https://releases.llvm.org/11.0.0/docs/LangRef.html#extractelement-instruction)) @@ -311,28 +382,48 @@ Those patterns (and more) may be applied to: * FP fclass, fsgn, fneg, fabs, fcvt, frecip, fsqrt etc. * Condition Register ops mfcr, mtcr and other similar -This is a huge list that creates extremely powerful combinations, particularly given that one of the predicate options is `(1< for details. +**NOTE THIS TABLE SHOULD NO LONGER BE HAND EDITED** see + for details. -Instructions are broken down by Register Profiles as listed in the following auto-generated page: -[[opcode_regs_deduped]]. "Non-SV" indicates that the operations with this Register Profile cannot be Vectorised (mtspr, bc, dcbz, twi) +Instructions are broken down by Register Profiles as listed in the +following auto-generated page: [[opcode_regs_deduped]]. "Non-SV" +indicates that the operations with this Register Profile cannot be +Vectorised (mtspr, bc, dcbz, twi) TODO generate table which will be here [[svp64/reg_profiles]] -- 2.30.2