From 6c2cb9b1826c44d5333452138573f7ae6a4877b9 Mon Sep 17 00:00:00 2001 From: lkcl Date: Wed, 30 Dec 2020 15:32:47 +0000 Subject: [PATCH] --- openpower/sv/svp64.mdwn | 263 ++++++++++++++++++++-------------------- 1 file changed, 131 insertions(+), 132 deletions(-) diff --git a/openpower/sv/svp64.mdwn b/openpower/sv/svp64.mdwn index ca1c6b8c1..aabbfdebb 100644 --- a/openpower/sv/svp64.mdwn +++ b/openpower/sv/svp64.mdwn @@ -205,6 +205,137 @@ Fields: * **N** sets signed/unsigned saturation. **RC1** as if Rc=1, stores CRs *but not the result* +# ELWIDTH Encoding + +Default behaviour is set to 0b00 so that zeros follow the convention of +"npt doing anything". In this case it means that elwidth overrides +are not applicable. Thus if a 32 bit instruction operates on 32 bit, +`elwidth=0b00` specifies that this behaviour is unmodified. Likewise +when a processor is switched from 64 bit to 32 bit mode, `elwidth=0b00` +states that, again, the behaviour is not to be modified. + +Only when elwidth is nonzero is the element width overridden to the +explicitly required value. + +## Elwidth for Integers: + +| Value | Mnemonic | Description | +|-------|----------------|------------------------------------| +| 00 | DEFAULT | default behaviour for operation | +| 01 | `ELWIDTH=b` | Byte: 8-bit integer | +| 10 | `ELWIDTH=h` | Halfword: 16-bit integer | +| 11 | `ELWIDTH=w` | Word: 32-bit integer | + +## Elwidth for FP Registers: + +| Value | Mnemonic | Description | +|-------|----------------|------------------------------------| +| 00 | DEFAULT | default behaviour for FP operation | +| 01 | `ELWIDTH=bf16` | Reserved for `bf16` | +| 10 | `ELWIDTH=f16` | 16-bit IEEE 754 Half floating-point | +| 11 | `ELWIDTH=f32` | 32-bit IEEE 754 Single floating-point | + +Note: +[`bf16`](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format) +is reserved for a future implementation of SV + +## Elwidth for CRs: + +TODO, important, particularly for crops, mfcr and mtcr, what elwidth +even means. instead it may be possible to use the bits as extra indices +(EXTRA6) to access the full 64 CRs. TBD, several ideas + +The actual width of the CRs cannot be altered: they are 4 bit. Also, +for Rc=1 operations that produce a result (in RT or FRT) and corresponding CR, it is +the INT/FP result to which the elwidth override applies, *not* the CR. +This therefore inherently places Rc=1 operations firmly out of scope as far as a "meaning" for elwidth on CRs is concerned. + +As mentioned TBD, this leaves crops etc. to have a meaning defined for +elwidth, because these ops are pure explicit CR based. + +Examples: mfxm may take the extra bits and use them as extra mask bits. + +# SUBVL Encoding + +the default for SUBVL is 1 and its encoding is 0b00 to indicate that +SUBVL is effectively disabled (a SUBVL for-loop of only one element). this +lines up in combination with all other "default is all zeros" behaviour. + +| Value | Mnemonic | Subvec | Description | +|-------|-----------|---------|------------------------| +| 00 | `SUBVL=1` | single | Sub-vector length of 1 | +| 01 | `SUBVL=2` | vec2 | Sub-vector length of 2 | +| 10 | `SUBVL=3` | vec3 | Sub-vector length of 3 | +| 11 | `SUBVL=4` | vec4 | Sub-vector length of 4 | + +The SUBVL encoding value may be thought of as an inclusive range of a +sub-vector. SUBVL=2 represents a vec2, its encoding is 0b01, therefore +this may be considered to be elements 0b00 to 0b01 inclusive. + +# MASK/MASK_SRC & MASK_KIND Encoding + +One bit (`MASKMODE`) indicates the mode: CR or Int predication. The two +types may not be mixed. + +Special note: to get default behaviour (SV disabled) this field must +be set to zero in combination with Integer Predication also being set +to 0b000. this has the effect of enabling "all 1s" in the predicate +mask, which is equivalent to "not having any predication at all" +and consequently, in combination with all other default zeros, fully +disables SV. + +| Value | Description | +|-------|------------------------------------------------------| +| 0 | MASK/MASK_SRC are encoded using Integer Predication | +| 1 | MASK/MASK_SRC are encoded using CR-based Predication | + +Integer Twin predication has a second set of 3 bits that uses the same +encoding thus allowing either the same register (r3 or r10) to be used +for both src and dest, or different regs (one for src, one for dest). + +Likewise CR based twin predication has a second set of 3 bits, allowing +a different test to be applied. + +## Integer Predication (MASK_KIND=0) + +When the predicate mode bit is zero the 3 bits are interpreted as below. +Twin predication has an identical 3 bit field similarly encoded. + +| Value | Mnemonic | Element `i` enabled if: | +|-------|----------|------------------------------| +| 000 | ALWAYS | predicate effectively all 1s | +| 001 | 1 << R3 | `i == R3` | +| 010 | R3 | `R3 & (1 << i)` is non-zero | +| 011 | ~R3 | `R3 & (1 << i)` is zero | +| 100 | R10 | `R10 & (1 << i)` is non-zero | +| 101 | ~R10 | `R10 & (1 << i)` is zero | +| 110 | R30 | `R30 & (1 << i)` is non-zero | +| 111 | ~R30 | `R30 & (1 << i)` is zero | + +## CR-based Predication (MASK_KIND=1) + +When the predicate mode bit is one the 3 bits are interpreted as below. +Twin predication has an identical 3 bit field similarly encoded + +| Value | Mnemonic | Element `i` is enabled if | +|-------|----------|--------------------------| +| 000 | lt | `CR[offs+i].LT` is set | +| 001 | nl/ge | `CR[offs+i].LT` is clear | +| 010 | gt | `CR[offs+i].GT` is set | +| 011 | ng/le | `CR[offs+i].GT` is clear | +| 100 | eq | `CR[offs+i].EQ` is set | +| 101 | ne | `CR[offs+i].EQ` is clear | +| 110 | so/un | `CR[offs+i].FU` is set | +| 111 | ns/nu | `CR[offs+i].FU` is clear | + +CR based predication. TODO: select alternate CR for twin predication? see +[[discussion]] Overlap of the two CR based predicates must be taken +into account, so the starting point for one of them must be suitably +high, or accept that for twin predication VL must not exceed the range +where overlap will occur, *or* that they use the same starting point +but select different *bits* of the same CRs + +`offs` is defined as CR32 (4x8) so as to mesh cleanly with Vectorised Rc=1 operations (see below). Rc=1 operations start from CR8 (TBD). # Extra Remapped Encoding @@ -385,138 +516,6 @@ Encoding shown MSB down to LSB | 10 | Vector | BA[4:2] | 0b000 | BA[1:0] | | 11 | Vector | BA[4:2] | 0b100 | BA[1:0] | -# ELWIDTH Encoding - -Default behaviour is set to 0b00 so that zeros follow the convention of -"npt doing anything". In this case it means that elwidth overrides -are not applicable. Thus if a 32 bit instruction operates on 32 bit, -`elwidth=0b00` specifies that this behaviour is unmodified. Likewise -when a processor is switched from 64 bit to 32 bit mode, `elwidth=0b00` -states that, again, the behaviour is not to be modified. - -Only when elwidth is nonzero is the element width overridden to the -explicitly required value. - -## Elwidth for Integers: - -| Value | Mnemonic | Description | -|-------|----------------|------------------------------------| -| 00 | DEFAULT | default behaviour for operation | -| 01 | `ELWIDTH=b` | Byte: 8-bit integer | -| 10 | `ELWIDTH=h` | Halfword: 16-bit integer | -| 11 | `ELWIDTH=w` | Word: 32-bit integer | - -## Elwidth for FP Registers: - -| Value | Mnemonic | Description | -|-------|----------------|------------------------------------| -| 00 | DEFAULT | default behaviour for FP operation | -| 01 | `ELWIDTH=bf16` | Reserved for `bf16` | -| 10 | `ELWIDTH=f16` | 16-bit IEEE 754 Half floating-point | -| 11 | `ELWIDTH=f32` | 32-bit IEEE 754 Single floating-point | - -Note: -[`bf16`](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format) -is reserved for a future implementation of SV - -## Elwidth for CRs: - -TODO, important, particularly for crops, mfcr and mtcr, what elwidth -even means. instead it may be possible to use the bits as extra indices -(EXTRA6) to access the full 64 CRs. TBD, several ideas - -The actual width of the CRs cannot be altered: they are 4 bit. Also, -for Rc=1 operations that produce a result (in RT or FRT) and corresponding CR, it is -the INT/FP result to which the elwidth override applies, *not* the CR. -This therefore inherently places Rc=1 operations firmly out of scope as far as a "meaning" for elwidth on CRs is concerned. - -As mentioned TBD, this leaves crops etc. to have a meaning defined for -elwidth, because these ops are pure explicit CR based. - -Examples: mfxm may take the extra bits and use them as extra mask bits. - -# SUBVL Encoding - -the default for SUBVL is 1 and its encoding is 0b00 to indicate that -SUBVL is effectively disabled (a SUBVL for-loop of only one element). this -lines up in combination with all other "default is all zeros" behaviour. - -| Value | Mnemonic | Subvec | Description | -|-------|-----------|---------|------------------------| -| 00 | `SUBVL=1` | single | Sub-vector length of 1 | -| 01 | `SUBVL=2` | vec2 | Sub-vector length of 2 | -| 10 | `SUBVL=3` | vec3 | Sub-vector length of 3 | -| 11 | `SUBVL=4` | vec4 | Sub-vector length of 4 | - -The SUBVL encoding value may be thought of as an inclusive range of a -sub-vector. SUBVL=2 represents a vec2, its encoding is 0b01, therefore -this may be considered to be elements 0b00 to 0b01 inclusive. - -# MASK/MASK_SRC & MASK_KIND Encoding - -One bit (`MASKMODE`) indicates the mode: CR or Int predication. The two -types may not be mixed. - -Special note: to get default behaviour (SV disabled) this field must -be set to zero in combination with Integer Predication also being set -to 0b000. this has the effect of enabling "all 1s" in the predicate -mask, which is equivalent to "not having any predication at all" -and consequently, in combination with all other default zeros, fully -disables SV. - -| Value | Description | -|-------|------------------------------------------------------| -| 0 | MASK/MASK_SRC are encoded using Integer Predication | -| 1 | MASK/MASK_SRC are encoded using CR-based Predication | - -Integer Twin predication has a second set of 3 bits that uses the same -encoding thus allowing either the same register (r3 or r10) to be used -for both src and dest, or different regs (one for src, one for dest). - -Likewise CR based twin predication has a second set of 3 bits, allowing -a different test to be applied. - -## Integer Predication (MASK_KIND=0) - -When the predicate mode bit is zero the 3 bits are interpreted as below. -Twin predication has an identical 3 bit field similarly encoded. - -| Value | Mnemonic | Element `i` enabled if: | -|-------|----------|------------------------------| -| 000 | ALWAYS | predicate effectively all 1s | -| 001 | 1 << R3 | `i == R3` | -| 010 | R3 | `R3 & (1 << i)` is non-zero | -| 011 | ~R3 | `R3 & (1 << i)` is zero | -| 100 | R10 | `R10 & (1 << i)` is non-zero | -| 101 | ~R10 | `R10 & (1 << i)` is zero | -| 110 | R30 | `R30 & (1 << i)` is non-zero | -| 111 | ~R30 | `R30 & (1 << i)` is zero | - -## CR-based Predication (MASK_KIND=1) - -When the predicate mode bit is one the 3 bits are interpreted as below. -Twin predication has an identical 3 bit field similarly encoded - -| Value | Mnemonic | Element `i` is enabled if | -|-------|----------|--------------------------| -| 000 | lt | `CR[offs+i].LT` is set | -| 001 | nl/ge | `CR[offs+i].LT` is clear | -| 010 | gt | `CR[offs+i].GT` is set | -| 011 | ng/le | `CR[offs+i].GT` is clear | -| 100 | eq | `CR[offs+i].EQ` is set | -| 101 | ne | `CR[offs+i].EQ` is clear | -| 110 | so/un | `CR[offs+i].FU` is set | -| 111 | ns/nu | `CR[offs+i].FU` is clear | - -CR based predication. TODO: select alternate CR for twin predication? see -[[discussion]] Overlap of the two CR based predicates must be taken -into account, so the starting point for one of them must be suitably -high, or accept that for twin predication VL must not exceed the range -where overlap will occur, *or* that they use the same starting point -but select different *bits* of the same CRs - -`offs` is defined as CR32 (4x8) so as to mesh cleanly with Vectorised Rc=1 operations (see below). Rc=1 operations start from CR8 (TBD). - # Appendix Now at its own page: [[svp64/appendix]] -- 2.30.2