fpowr(s), TBD, low, 10, yes, EXT2xx, no, transcendentals, 2R1W1w
frootn(s), TBD, low, 10, yes, EXT2xx, no, transcendentals, 2R1W1w
fhypot(s), TBD, TBD, 10, yes, TBD, no, transcendentals, 2R1W1w
-fminnum08(s), TBD, TBD, 10, yes, TBD, no, transcendentals, 2R1W1w
-fmaxnum08(s), TBD, TBD, 10, yes, TBD, no, transcendentals, 2R1W1w
-fmin19(s), TBD, TBD, 10, yes, TBD, no, transcendentals, 2R1W1w
-fmax19(s), TBD, TBD, 10, yes, TBD, no, transcendentals, 2R1W1w
-fminnum19(s), TBD, TBD, 10, yes, TBD, no, transcendentals, 2R1W1w
-fmaxnum19(s), TBD, TBD, 10, yes, TBD, no, transcendentals, 2R1W1w
-fminc(s), TBD, TBD, 10, yes, TBD, no, transcendentals, 2R1W1w
-fmaxc(s), TBD, TBD, 10, yes, TBD, no, transcendentals, 2R1W1w
-fminmagnum08(s), TBD, TBD, 10, yes, TBD, no, transcendentals, 2R1W1w
-fmaxmagnum08(s), TBD, TBD, 10, yes, TBD, no, transcendentals, 2R1W1w
-fminmag19(s), TBD, TBD, 10, yes, TBD, no, transcendentals, 2R1W1w
-fmaxmag19(s), TBD, TBD, 10, yes, TBD, no, transcendentals, 2R1W1w
-fminmagnum19(s), TBD, TBD, 10, yes, TBD, no, transcendentals, 2R1W1w
-fmaxmagnum19(s), TBD, TBD, 10, yes, TBD, no, transcendentals, 2R1W1w
-fminmagc(s), TBD, TBD, 10, yes, TBD, no, transcendentals, 2R1W1w
-fmaxmagc(s), TBD, TBD, 10, yes, TBD, no, transcendentals, 2R1W1w
+fminnum08(s), TBD, high, 10, yes, TBD, no, transcendentals, 2R1W1w
+fmaxnum08(s), TBD, high, 10, yes, TBD, no, transcendentals, 2R1W1w
+fmin19(s), TBD, high, 10, yes, TBD, no, transcendentals, 2R1W1w
+fmax19(s), TBD, high, 10, yes, TBD, no, transcendentals, 2R1W1w
+fminnum19(s), TBD, high, 10, yes, TBD, no, transcendentals, 2R1W1w
+fmaxnum19(s), TBD, high, 10, yes, TBD, no, transcendentals, 2R1W1w
+fminc(s), TBD, high, 10, yes, TBD, no, transcendentals, 2R1W1w
+fmaxc(s), TBD, high, 10, yes, TBD, no, transcendentals, 2R1W1w
+fminmagnum08(s), TBD, high, 10, yes, TBD, no, transcendentals, 2R1W1w
+fmaxmagnum08(s), TBD, high, 10, yes, TBD, no, transcendentals, 2R1W1w
+fminmag19(s), TBD, high, 10, yes, TBD, no, transcendentals, 2R1W1w
+fmaxmag19(s), TBD, high, 10, yes, TBD, no, transcendentals, 2R1W1w
+fminmagnum19(s), TBD, high, 10, yes, TBD, no, transcendentals, 2R1W1w
+fmaxmagnum19(s), TBD, high, 10, yes, TBD, no, transcendentals, 2R1W1w
+fminmagc(s), TBD, high, 10, yes, TBD, no, transcendentals, 2R1W1w
+fmaxmagc(s), TBD, high, 10, yes, TBD, no, transcendentals, 2R1W1w
fmod(s), TBD, TBD, 10, yes, TBD, no, transcendentals, 2R1W1w
fremainder(s), TBD, TBD, 10, yes, TBD, no, transcendentals, 2R1W1w
acosh, atanh (can be synthesised - see below)
* **ZftransAdv**: much more complex to implement in hardware
* **Zfrsqrt**: Reciprocal square-root.
+* **Zfminmax**: Min/Max.
Minimum recommended requirements for 3D: Zftrans, Ztrignpi,
Zarctrignpi, with Ztrigpi and Zarctrigpi as augmentations.
| fpowr | x power of y (x +ve) | FRT = exp(FRA log(FRB)) | ZftransAdv |
| frootn | x power 1/n (n integer) | FRT = pow(FRA, 1/RB) | ZftransAdv |
| fhypot | hypotenuse | FRT = sqrt(FRA^2 + FRB^2) | ZftransAdv |
-| fminnum08 | IEEE 754-2008 minNum | FRT = minNum(FRA, FRB) (1) | TBD |
-| fmaxnum08 | IEEE 754-2008 maxNum | FRT = maxNum(FRA, FRB) (1) | TBD |
-| fmin19 | IEEE 754-2019 minimum | FRT = minimum(FRA, FRB) | TBD |
-| fmax19 | IEEE 754-2019 maximum | FRT = maximum(FRA, FRB) | TBD |
-| fminnum19 | IEEE 754-2019 minimumNumber | FRT = minimumNumber(FRA, FRB) | TBD |
-| fmaxnum19 | IEEE 754-2019 maximumNumber | FRT = maximumNumber(FRA, FRB) | TBD |
-| fminc | C ternary-op minimum | FRT = FRA \< FRB ? FRA : FRB | TBD |
-| fmaxc | C ternary-op maximum | FRT = FRA > FRB ? FRA : FRB | TBD |
-| fminmagnum08 | IEEE 754-2008 minNumMag | FRT = minmaxmag(FRA, FRB, False, fminnum08) (2)| TBD |
-| fmaxmagnum08 | IEEE 754-2008 maxNumMag | FRT = minmaxmag(FRA, FRB, True, fmaxnum08) (2) | TBD |
-| fminmag19 | IEEE 754-2019 minimumMagnitude | FRT = minmaxmag(FRA, FRB, False, fmin19) (2) | TBD |
-| fmaxmag19 | IEEE 754-2019 maximumMagnitude | FRT = minmaxmag(FRA, FRB, True, fmax19) (2) | TBD |
-| fminmagnum19 | IEEE 754-2019 minimumMagnitudeNumber | FRT = minmaxmag(FRA, FRB, False, fminnum19) (2)| TBD |
-| fmaxmagnum19 | IEEE 754-2019 maximumMagnitudeNumber | FRT = minmaxmag(FRA, FRB, True, fmaxnum19) (2) | TBD |
-| fminmagc | C ternary-op minimum magnitude | FRT = minmaxmag(FRA, FRB, False, fminc) (2) | TBD |
-| fmaxmagc | C ternary-op maximum magnitude | FRT = minmaxmag(FRA, FRB, True, fmaxc) (2) | TBD |
+| fminnum08 | IEEE 754-2008 minNum | FRT = minNum(FRA, FRB) (1) | Zfminmax |
+| fmaxnum08 | IEEE 754-2008 maxNum | FRT = maxNum(FRA, FRB) (1) | Zfminmax |
+| fmin19 | IEEE 754-2019 minimum | FRT = minimum(FRA, FRB) | Zfminmax |
+| fmax19 | IEEE 754-2019 maximum | FRT = maximum(FRA, FRB) | Zfminmax |
+| fminnum19 | IEEE 754-2019 minimumNumber | FRT = minimumNumber(FRA, FRB) | Zfminmax |
+| fmaxnum19 | IEEE 754-2019 maximumNumber | FRT = maximumNumber(FRA, FRB) | Zfminmax |
+| fminc | C ternary-op minimum | FRT = FRA \< FRB ? FRA : FRB | Zfminmax |
+| fmaxc | C ternary-op maximum | FRT = FRA > FRB ? FRA : FRB | Zfminmax |
+| fminmagnum08 | IEEE 754-2008 minNumMag | FRT = minmaxmag(FRA, FRB, False, fminnum08) (2)| Zfminmax |
+| fmaxmagnum08 | IEEE 754-2008 maxNumMag | FRT = minmaxmag(FRA, FRB, True, fmaxnum08) (2) | Zfminmax |
+| fminmag19 | IEEE 754-2019 minimumMagnitude | FRT = minmaxmag(FRA, FRB, False, fmin19) (2) | Zfminmax |
+| fmaxmag19 | IEEE 754-2019 maximumMagnitude | FRT = minmaxmag(FRA, FRB, True, fmax19) (2) | Zfminmax |
+| fminmagnum19 | IEEE 754-2019 minimumMagnitudeNumber | FRT = minmaxmag(FRA, FRB, False, fminnum19) (2)| Zfminmax |
+| fmaxmagnum19 | IEEE 754-2019 maximumMagnitudeNumber | FRT = minmaxmag(FRA, FRB, True, fmaxnum19) (2) | Zfminmax |
+| fminmagc | C ternary-op minimum magnitude | FRT = minmaxmag(FRA, FRB, False, fminc) (2) | Zfminmax |
+| fmaxmagc | C ternary-op maximum magnitude | FRT = minmaxmag(FRA, FRB, True, fmaxc) (2) | Zfminmax |
| fmod | modulus | FRT = fmod(FRA, FRB) | TBD |
| fremainder | IEEE 754 remainder | FRT = remainder(FRA, FRB) | TBD |
MALI Midgard, an embedded / mobile 3D GPU, for example only has the
following opcodes:
+ 28 - fmin
+ 2C - fmax
E8 - fatan_pt2
F0 - frcp (reciprocal)
F2 - frsqrt (inverse square root, 1/sqrt(x))
<https://github.com/laanwj/etna_viv/blob/master/rnndb/isa.xml>)
only has the following:
+ fmin/fmax (implemented using SELECT)
sin, cos2pi
cos, sin2pi
log2, exp
AMD's R600 GPU (R600\_Instruction\_Set\_Architecture.pdf) and the
RDNA ISA (RDNA\_Shader\_ISA\_5August2019.pdf, Table 22, Section 6.3) have:
+ MIN/MAX/MIN_DX10/MAX_DX10
COS2PI (appx)
EXP2
LOG (IEEE754)
SIN2PI (appx)
AMD RDNA has F16 and F32 variants of all the above, and also has F64
-variants of SQRT, RSQRT and RECIP. It is interesting that even the
+variants of SQRT, RSQRT, MIN, MAX, and RECIP. It is interesting that even the
modern high-end AMD GPU does not have TAN or ATAN, where MALI Midgard
does.
Therefore they are their own subset extensions.
+### Zfminmax
+
+* fminnum08 fmaxnum08
+* fmin19 fmax19
+* fminnum19 fmaxnum19
+* fminc fmaxc
+* fminmagnum08 fmaxmagnum08
+* fminmag19 fmaxmag19
+* fminmagnum19 fmaxmagnum19
+* fminmagc fmaxmagc
+
+These are commonly used for vector reductions, where having them be a single
+instruction is critical. They are also commonly used in GPU shaders, HPC, and
+general-purpose FP algorithms.
+
+These min and max operations are quite cheap to implement hardware-wise,
+being comparable in cost to fcmp + some muxes. They're all in one extension
+because once you implement some of them, the rest require only slightly more
+hardware complexity.
+
+Therefore they are their own subset extension.
+
# Synthesis, Pseudo-code ops and macro-ops
The pseudo-ops are best left up to the compiler rather than being actual