See:
* <http://bugs.libre-soc.org/show_bug.cgi?id=127>
+* <https://bugs.libre-soc.org/show_bug.cgi?id=899> transcendentals in simulator
* <https://bugs.libre-soc.org/show_bug.cgi?id=923> under review
* <https://www.khronos.org/registry/spir-v/specs/unified1/OpenCL.ExtendedInstructionSet.100.html>
* [[power_trans_ops]] for opcode listing.
acosh, atanh (can be synthesised - see below)
* **ZftransAdv**: much more complex to implement in hardware
* **Zfrsqrt**: Reciprocal square-root.
+* **Zfminmax**: Min/Max.
Minimum recommended requirements for 3D: Zftrans, Ztrignpi,
Zarctrignpi, with Ztrigpi and Zarctrigpi as augmentations.
Interestingly the only functions missing when compared to OpenCL are
compound, exp2m1, exp10m1, log2p1, log10p1, pown (integer power) and powr.
-|opcode |OpenCL FP32|OpenCL FP16|OpenCL native|IEEE754 |Power ISA |
-|------- |-----------|-----------|-------------|------- |--------- |
-|FSIN |sin |half\_sin |native\_sin |sin |NONE |
-|FCOS |cos |half\_cos |native\_cos |cos |NONE |
-|FTAN |tan |half\_tan |native\_tan |tan |NONE |
-|NONE (1)|sincos |NONE |NONE |NONE |NONE |
-|FASIN |asin |NONE |NONE |asin |NONE |
-|FACOS |acos |NONE |NONE |acos |NONE |
-|FATAN |atan |NONE |NONE |atan |NONE |
-|FSINPI |sinpi |NONE |NONE |sinPi |NONE |
-|FCOSPI |cospi |NONE |NONE |cosPi |NONE |
-|FTANPI |tanpi |NONE |NONE |tanPi |NONE |
-|FASINPI |asinpi |NONE |NONE |asinPi |NONE |
-|FACOSPI |acospi |NONE |NONE |acosPi |NONE |
-|FATANPI |atanpi |NONE |NONE |atanPi |NONE |
-|FSINH |sinh |NONE |NONE |sinh |NONE |
-|FCOSH |cosh |NONE |NONE |cosh |NONE |
-|FTANH |tanh |NONE |NONE |tanh |NONE |
-|FASINH |asinh |NONE |NONE |asinh |NONE |
-|FACOSH |acosh |NONE |NONE |acosh |NONE |
-|FATANH |atanh |NONE |NONE |atanh |NONE |
-|FATAN2 |atan2 |NONE |NONE |atan2 |NONE |
-|FATAN2PI|atan2pi |NONE |NONE |atan2pi |NONE |
-|FRSQRT |rsqrt |half\_rsqrt|native\_rsqrt|rSqrt |fsqrte, fsqrtes (4) |
-|FCBRT |cbrt |NONE |NONE |NONE (2)|NONE |
-|FEXP2 |exp2 |half\_exp2 |native\_exp2 |exp2 |NONE |
-|FLOG2 |log2 |half\_log2 |native\_log2 |log2 |NONE |
-|FEXPM1 |expm1 |NONE |NONE |expm1 |NONE |
-|FLOG1P |log1p |NONE |NONE |logp1 |NONE |
-|FEXP |exp |half\_exp |native\_exp |exp |NONE |
-|FLOG |log |half\_log |native\_log |log |NONE |
-|FEXP10 |exp10 |half\_exp10|native\_exp10|exp10 |NONE |
-|FLOG10 |log10 |half\_log10|native\_log10|log10 |NONE |
-|FPOW |pow |NONE |NONE |pow |NONE |
-|FPOWN |pown |NONE |NONE |pown |NONE |
-|FPOWR |powr |half\_powr |native\_powr |powr |NONE |
-|FROOTN |rootn |NONE |NONE |rootn |NONE |
-|FHYPOT |hypot |NONE |NONE |hypot |NONE |
-|FRECIP |NONE |half\_recip|native\_recip|NONE (3)|fre, fres (4) |
-|NONE |NONE |NONE |NONE |compound|NONE |
-|FEXP2M1 |NONE |NONE |NONE |exp2m1 |NONE |
-|FEXP10M1|NONE |NONE |NONE |exp10m1 |NONE |
-|FLOG2P1 |NONE |NONE |NONE |log2p1 |NONE |
-|FLOG10P1|NONE |NONE |NONE |log10p1 |NONE |
-
-Note (1) FSINCOS is macro-op fused (see below).
+|opcode |OpenCL FP32|OpenCL FP16|OpenCL native|IEEE754 |Power ISA |My 66000 ISA |
+|------------|-----------|-----------|-------------|-------------- |------------------------|-------------|
+|fsin |sin |half\_sin |native\_sin |sin |NONE |sin |
+|fcos |cos |half\_cos |native\_cos |cos |NONE |cos |
+|ftan |tan |half\_tan |native\_tan |tan |NONE |tan |
+|NONE (1) |sincos |NONE |NONE |NONE |NONE | |
+|fasin |asin |NONE |NONE |asin |NONE |asin |
+|facos |acos |NONE |NONE |acos |NONE |acos |
+|fatan |atan |NONE |NONE |atan |NONE |atan |
+|fsinpi |sinpi |NONE |NONE |sinPi |NONE |sinpi |
+|fcospi |cospi |NONE |NONE |cosPi |NONE |cospi |
+|ftanpi |tanpi |NONE |NONE |tanPi |NONE |tanpi |
+|fasinpi |asinpi |NONE |NONE |asinPi |NONE |asinpi |
+|facospi |acospi |NONE |NONE |acosPi |NONE |acospi |
+|fatanpi |atanpi |NONE |NONE |atanPi |NONE |atanpi |
+|fsinh |sinh |NONE |NONE |sinh |NONE | |
+|fcosh |cosh |NONE |NONE |cosh |NONE | |
+|ftanh |tanh |NONE |NONE |tanh |NONE | |
+|fasinh |asinh |NONE |NONE |asinh |NONE | |
+|facosh |acosh |NONE |NONE |acosh |NONE | |
+|fatanh |atanh |NONE |NONE |atanh |NONE | |
+|fatan2 |atan2 |NONE |NONE |atan2 |NONE |atan2 |
+|fatan2pi |atan2pi |NONE |NONE |atan2pi |NONE |atan2pi |
+|frsqrt |rsqrt |half\_rsqrt|native\_rsqrt|rSqrt |fsqrte, fsqrtes (4) |rsqrt |
+|fcbrt |cbrt |NONE |NONE |NONE (2) |NONE | |
+|fexp2 |exp2 |half\_exp2 |native\_exp2 |exp2 |NONE |exp2 |
+|flog2 |log2 |half\_log2 |native\_log2 |log2 |NONE |ln2 |
+|fexpm1 |expm1 |NONE |NONE |expm1 |NONE |expm1 |
+|flog1p |log1p |NONE |NONE |logp1 |NONE |logp1 |
+|fexp |exp |half\_exp |native\_exp |exp |NONE |exp |
+|flog |log |half\_log |native\_log |log |NONE |ln |
+|fexp10 |exp10 |half\_exp10|native\_exp10|exp10 |NONE |exp10 |
+|flog10 |log10 |half\_log10|native\_log10|log10 |NONE |log |
+|fpow |pow |NONE |NONE |pow |NONE |pow |
+|fpown |pown |NONE |NONE |pown |NONE | |
+|fpowr |powr |half\_powr |native\_powr |powr |NONE | |
+|frootn |rootn |NONE |NONE |rootn |NONE | |
+|fhypot |hypot |NONE |NONE |hypot |NONE | |
+|frecip |NONE |half\_recip|native\_recip|NONE (3) |fre, fres (4) |rcp |
+|NONE |NONE |NONE |NONE |compound |NONE | |
+|fexp2m1 |NONE |NONE |NONE |exp2m1 |NONE |exp2m1 |
+|fexp10m1 |NONE |NONE |NONE |exp10m1 |NONE |exp10m1 |
+|flog2p1 |NONE |NONE |NONE |log2p1 |NONE |ln2p1 |
+|flog10p1 |NONE |NONE |NONE |log10p1 |NONE |logp1 |
+|fminnum08 |fmin |fmin |NONE |minNum |xsmindp (5) | |
+|fmaxnum08 |fmax |fmax |NONE |maxNum |xsmaxdp (5) | |
+|fmin19 |fmin |fmin |NONE |minimum |NONE |fmin |
+|fmax19 |fmax |fmax |NONE |maximum |NONE |fmax |
+|fminnum19 |fmin |fmin |NONE |minimumNumber |vminfp (6), xsminjdp (5)| |
+|fmaxnum19 |fmax |fmax |NONE |maximumNumber |vmaxfp (6), xsmaxjdp (5)| |
+|fminc |fmin |fmin |NONE |NONE |xsmincdp (5) |fmin* |
+|fmaxc |fmax |fmax |NONE |NONE |xsmaxcdp (5) |fmax* |
+|fminmagnum08|minmag |minmag |NONE |minNumMag |NONE | |
+|fmaxmagnum08|maxmag |maxmag |NONE |maxNumMag |NONE | |
+|fminmag19 |minmag |minmag |NONE |minimumMagnitude |NONE | |
+|fmaxmag19 |maxmag |maxmag |NONE |maximumMagnitude |NONE | |
+|fminmagnum19|minmag |minmag |NONE |minimumMagnitudeNumber|NONE | |
+|fmaxmagnum19|maxmag |maxmag |NONE |maximumMagnitudeNumber|NONE | |
+|fminmagc |minmag |minmag |NONE |NONE |NONE | |
+|fmaxmagc |maxmag |maxmag |NONE |NONE |NONE | |
+|fmod |fmod |fmod | |NONE |NONE | |
+|fremainder |remainder |remainder | |remainder |NONE | |
+
+ from Mitch Alsup:
+
+* Brian's LLVM compiler converts fminc and fmaxc into fmin and fmax instructions
+These are all IEEE 754-2019 compliant
+These are native instructions not extensions
+All listed functions are available in both F32 and F64 formats.
+THere is some confusion (in my head) abouot fmin and fmax. I intend both instruction to perform 754-2019 semantics--
+but I don know if this is minimum/maximum or minimumNumber/maximumNumber.
+fmad and remainder are a 2-instruction sequence--don't know how to "edit it in"
+
+
+Note (1) fsincos is macro-op fused (see below).
Note (2) synthesised in IEEE754-2019 as "rootn(x, 3)"
Note (4) these are estimate opcodes that help accelerate
software emulation
+Note (5) f64-only (though can be used on f32 stored in f64 format), requires VSX.
+
+Note (6) 4xf32-only, requires VMX.
+
## List of 2-arg opcodes
-| opcode | Description | pseudocode | Extension |
-| ------ | ---------------- | ---------------- | ----------- |
-| FATAN2 | atan2 arc tangent | FRT = atan2(FRB, FRA) | Zarctrignpi |
-| FATAN2PI | atan2 arc tangent / pi | FRT = atan2(FRB, FRA) / pi | Zarctrigpi |
-| FPOW | x power of y | FRT = pow(FRA, FRB) | ZftransAdv |
-| FPOWN | x power of n (n int) | FRT = pow(FRA, RB) | ZftransAdv |
-| FPOWR | x power of y (x +ve) | FRT = exp(FRA log(FRB)) | ZftransAdv |
-| FROOTN | x power 1/n (n integer)| FRT = pow(FRA, 1/RB) | ZftransAdv |
-| FHYPOT | hypotenuse | FRT = sqrt(FRA^2 + FRB^2) | ZftransAdv |
+| opcode | Description | pseudocode | Extension |
+| ------ | ---------------- | ---------------- | ----------- |
+| fatan2 | atan2 arc tangent | FRT = atan2(FRB, FRA) | Zarctrignpi |
+| fatan2pi | atan2 arc tangent / pi | FRT = atan2(FRB, FRA) / pi | Zarctrigpi |
+| fpow | x power of y | FRT = pow(FRA, FRB) | ZftransAdv |
+| fpown | x power of n (n int) | FRT = pow(FRA, RB) | ZftransAdv |
+| fpowr | x power of y (x +ve) | FRT = exp(FRA log(FRB)) | ZftransAdv |
+| frootn | x power 1/n (n integer) | FRT = pow(FRA, 1/RB) | ZftransAdv |
+| fhypot | hypotenuse | FRT = sqrt(FRA^2 + FRB^2) | ZftransAdv |
+| fminnum08 | IEEE 754-2008 minNum | FRT = minNum(FRA, FRB) (1) | Zfminmax |
+| fmaxnum08 | IEEE 754-2008 maxNum | FRT = maxNum(FRA, FRB) (1) | Zfminmax |
+| fmin19 | IEEE 754-2019 minimum | FRT = minimum(FRA, FRB) | Zfminmax |
+| fmax19 | IEEE 754-2019 maximum | FRT = maximum(FRA, FRB) | Zfminmax |
+| fminnum19 | IEEE 754-2019 minimumNumber | FRT = minimumNumber(FRA, FRB) | Zfminmax |
+| fmaxnum19 | IEEE 754-2019 maximumNumber | FRT = maximumNumber(FRA, FRB) | Zfminmax |
+| fminc | C ternary-op minimum | FRT = FRA \< FRB ? FRA : FRB | Zfminmax |
+| fmaxc | C ternary-op maximum | FRT = FRA > FRB ? FRA : FRB | Zfminmax |
+| fminmagnum08 | IEEE 754-2008 minNumMag | FRT = minmaxmag(FRA, FRB, False, fminnum08) (2)| Zfminmax |
+| fmaxmagnum08 | IEEE 754-2008 maxNumMag | FRT = minmaxmag(FRA, FRB, True, fmaxnum08) (2) | Zfminmax |
+| fminmag19 | IEEE 754-2019 minimumMagnitude | FRT = minmaxmag(FRA, FRB, False, fmin19) (2) | Zfminmax |
+| fmaxmag19 | IEEE 754-2019 maximumMagnitude | FRT = minmaxmag(FRA, FRB, True, fmax19) (2) | Zfminmax |
+| fminmagnum19 | IEEE 754-2019 minimumMagnitudeNumber | FRT = minmaxmag(FRA, FRB, False, fminnum19) (2)| Zfminmax |
+| fmaxmagnum19 | IEEE 754-2019 maximumMagnitudeNumber | FRT = minmaxmag(FRA, FRB, True, fmaxnum19) (2) | Zfminmax |
+| fminmagc | C ternary-op minimum magnitude | FRT = minmaxmag(FRA, FRB, False, fminc) (2) | Zfminmax |
+| fmaxmagc | C ternary-op maximum magnitude | FRT = minmaxmag(FRA, FRB, True, fmaxc) (2) | Zfminmax |
+| fmod | modulus | FRT = fmod(FRA, FRB) | ZftransExt |
+| fremainder | IEEE 754 remainder | FRT = remainder(FRA, FRB) | ZftransExt |
+
+Note (1): for the purposes of minNum/maxNum, -0.0 is defined to be less than +0.0. This is left unspecified in IEEE 754-2008.
+
+Note (2): minmaxmag(x, y, cmp, fallback) is defined as:
+
+```python
+def minmaxmag(x, y, is_max, fallback):
+ a = abs(x) < abs(y)
+ b = abs(x) > abs(y)
+ if is_max:
+ a, b = b, a # swap
+ if a:
+ return x
+ if b:
+ return y
+ # equal magnitudes, or NaN input(s)
+ return fallback(x, y)
+```
## List of 1-arg transcendental opcodes
| opcode | Description | pseudocode | Extension |
| ------ | ---------------- | ---------------- | ---------- |
-| FRSQRT | Reciprocal Square-root | FRT = sqrt(FRA) | Zfrsqrt |
-| FCBRT | Cube Root | FRT = pow(FRA, 1.0 / 3) | ZftransAdv |
-| FRECIP | Reciprocal | FRT = 1.0 / FRA | Zftrans |
-| FEXP2M1 | power-2 minus 1 | FRT = pow(2, FRA) - 1.0 | ZftransExt |
-| FLOG2P1 | log2 plus 1 | FRT = log(2, 1 + FRA) | ZftransExt |
-| FEXP2 | power-of-2 | FRT = pow(2, FRA) | Zftrans |
-| FLOG2 | log2 | FRT = log(2. FRA) | Zftrans |
-| FEXPM1 | exponential minus 1 | FRT = pow(e, FRA) - 1.0 | ZftransExt |
-| FLOG1P | log plus 1 | FRT = log(e, 1 + FRA) | ZftransExt |
-| FEXP | exponential | FRT = pow(e, FRA) | ZftransExt |
-| FLOG | natural log (base e) | FRT = log(e, FRA) | ZftransExt |
-| FEXP10M1 | power-10 minus 1 | FRT = pow(10, FRA) - 1.0 | ZftransExt |
-| FLOG10P1 | log10 plus 1 | FRT = log(10, 1 + FRA) | ZftransExt |
-| FEXP10 | power-of-10 | FRT = pow(10, FRA) | ZftransExt |
-| FLOG10 | log base 10 | FRT = log(10, FRA) | ZftransExt |
+| frsqrt | Reciprocal Square-root | FRT = sqrt(FRA) | Zfrsqrt |
+| fcbrt | Cube Root | FRT = pow(FRA, 1.0 / 3) | ZftransAdv |
+| frecip | Reciprocal | FRT = 1.0 / FRA | Zftrans |
+| fexp2m1 | power-2 minus 1 | FRT = pow(2, FRA) - 1.0 | ZftransExt |
+| flog2p1 | log2 plus 1 | FRT = log(2, 1 + FRA) | ZftransExt |
+| fexp2 | power-of-2 | FRT = pow(2, FRA) | Zftrans |
+| flog2 | log2 | FRT = log(2. FRA) | Zftrans |
+| fexpm1 | exponential minus 1 | FRT = pow(e, FRA) - 1.0 | ZftransExt |
+| flog1p | log plus 1 | FRT = log(e, 1 + FRA) | ZftransExt |
+| fexp | exponential | FRT = pow(e, FRA) | ZftransExt |
+| flog | natural log (base e) | FRT = log(e, FRA) | ZftransExt |
+| fexp10m1 | power-10 minus 1 | FRT = pow(10, FRA) - 1.0 | ZftransExt |
+| flog10p1 | log10 plus 1 | FRT = log(10, 1 + FRA) | ZftransExt |
+| fexp10 | power-of-10 | FRT = pow(10, FRA) | ZftransExt |
+| flog10 | log base 10 | FRT = log(10, FRA) | ZftransExt |
## List of 1-arg trigonometric opcodes
| opcode | Description | pseudocode | Extension |
| -------- | ------------------------ | ------------------------ | ----------- |
-| FSIN | sin (radians) | FRT = sin(FRA) | Ztrignpi |
-| FCOS | cos (radians) | FRT = cos(FRA) | Ztrignpi |
-| FTAN | tan (radians) | FRT = tan(FRA) | Ztrignpi |
-| FASIN | arcsin (radians) | FRT = asin(FRA) | Zarctrignpi |
-| FACOS | arccos (radians) | FRT = acos(FRA) | Zarctrignpi |
-| FATAN | arctan (radians) | FRT = atan(FRA) | Zarctrignpi |
-| FSINPI | sin times pi | FRT = sin(pi * FRA) | Ztrigpi |
-| FCOSPI | cos times pi | FRT = cos(pi * FRA) | Ztrigpi |
-| FTANPI | tan times pi | FRT = tan(pi * FRA) | Ztrigpi |
-| FASINPI | arcsin / pi | FRT = asin(FRA) / pi | Zarctrigpi |
-| FACOSPI | arccos / pi | FRT = acos(FRA) / pi | Zarctrigpi |
-| FATANPI | arctan / pi | FRT = atan(FRA) / pi | Zarctrigpi |
-| FSINH | hyperbolic sin (radians) | FRT = sinh(FRA) | Zfhyp |
-| FCOSH | hyperbolic cos (radians) | FRT = cosh(FRA) | Zfhyp |
-| FTANH | hyperbolic tan (radians) | FRT = tanh(FRA) | Zfhyp |
-| FASINH | inverse hyperbolic sin | FRT = asinh(FRA) | Zfhyp |
-| FACOSH | inverse hyperbolic cos | FRT = acosh(FRA) | Zfhyp |
-| FATANH | inverse hyperbolic tan | FRT = atanh(FRA) | Zfhyp |
+| fsin | sin (radians) | FRT = sin(FRA) | Ztrignpi |
+| fcos | cos (radians) | FRT = cos(FRA) | Ztrignpi |
+| ftan | tan (radians) | FRT = tan(FRA) | Ztrignpi |
+| fasin | arcsin (radians) | FRT = asin(FRA) | Zarctrignpi |
+| facos | arccos (radians) | FRT = acos(FRA) | Zarctrignpi |
+| fatan | arctan (radians) | FRT = atan(FRA) | Zarctrignpi |
+| fsinpi | sin times pi | FRT = sin(pi * FRA) | Ztrigpi |
+| fcospi | cos times pi | FRT = cos(pi * FRA) | Ztrigpi |
+| ftanpi | tan times pi | FRT = tan(pi * FRA) | Ztrigpi |
+| fasinpi | arcsin / pi | FRT = asin(FRA) / pi | Zarctrigpi |
+| facospi | arccos / pi | FRT = acos(FRA) / pi | Zarctrigpi |
+| fatanpi | arctan / pi | FRT = atan(FRA) / pi | Zarctrigpi |
+| fsinh | hyperbolic sin (radians) | FRT = sinh(FRA) | Zfhyp |
+| fcosh | hyperbolic cos (radians) | FRT = cosh(FRA) | Zfhyp |
+| ftanh | hyperbolic tan (radians) | FRT = tanh(FRA) | Zfhyp |
+| fasinh | inverse hyperbolic sin | FRT = asinh(FRA) | Zfhyp |
+| facosh | inverse hyperbolic cos | FRT = acosh(FRA) | Zfhyp |
+| fatanh | inverse hyperbolic tan | FRT = atanh(FRA) | Zfhyp |
[[!inline pages="openpower/power_trans_ops" raw=yes ]]
MALI Midgard, an embedded / mobile 3D GPU, for example only has the
following opcodes:
+ 28 - fmin
+ 2C - fmax
E8 - fatan_pt2
F0 - frcp (reciprocal)
F2 - frsqrt (inverse square root, 1/sqrt(x))
<https://github.com/laanwj/etna_viv/blob/master/rnndb/isa.xml>)
only has the following:
+ fmin/fmax (implemented using SELECT)
sin, cos2pi
cos, sin2pi
log2, exp
AMD's R600 GPU (R600\_Instruction\_Set\_Architecture.pdf) and the
RDNA ISA (RDNA\_Shader\_ISA\_5August2019.pdf, Table 22, Section 6.3) have:
+ MIN/MAX/MIN_DX10/MAX_DX10
COS2PI (appx)
EXP2
LOG (IEEE754)
SIN2PI (appx)
AMD RDNA has F16 and F32 variants of all the above, and also has F64
-variants of SQRT, RSQRT and RECIP. It is interesting that even the
+variants of SQRT, RSQRT, MIN, MAX, and RECIP. It is interesting that even the
modern high-end AMD GPU does not have TAN or ATAN, where MALI Midgard
does.
### ZftransExt
-LOG, EXP, EXP10, LOG10, LOGP1, EXP1M
+LOG, EXP, EXP10, LOG10, LOGP1, EXP1M, fmod, fremainder
These are extra transcendental functions that are useful, not generally
needed for 3D, however for Numerical Computation they may be useful.
These are simply much more complex to implement in hardware, and typically
will only be put into HPC applications.
+Note that `pow` is commonly used in Blinn-Phong shading (the shading model used
+by OpenGL 1.0 and commonly used by shader authors that need basic 3D graphics
+with specular highlights), however it can be sufficiently emulated using
+`pow(b, n) = exp2(n*log2(b))`.
+
* **Zfrsqrt**: Reciprocal square-root.
## Trigonometric subsets
(programmerjake: actually, all other GPU ISAs mentioned in this document have sinpi/cospi or equivalent, and often not sin/cos, because sinpi/cospi are actually *waay* easier to implement because range reduction is simply a bitwise mask, whereas for sin/cos range reduction is a full division by pi)
+(Mitch: My patent USPTO 10,761,806 shows that the above statement is no longer true.)
+
+
In the case of the Ztrigpi subset, these are commonly used in for loops
with a power of two number of subdivisions, and the cost of multiplying
by PI inside each loop (or cumulative addition, resulting in cumulative
Therefore they are their own subset extensions.
+### Zfminmax
+
+* fminnum08 fmaxnum08
+* fmin19 fmax19
+* fminnum19 fmaxnum19
+* fminc fmaxc
+* fminmagnum08 fmaxmagnum08
+* fminmag19 fmaxmag19
+* fminmagnum19 fmaxmagnum19
+* fminmagc fmaxmagc
+
+These are commonly used for vector reductions, where having them be a single
+instruction is critical. They are also commonly used in GPU shaders, HPC, and
+general-purpose FP algorithms.
+
+These min and max operations are quite cheap to implement hardware-wise,
+being comparable in cost to fcmp + some muxes. They're all in one extension
+because once you implement some of them, the rest require only slightly more
+hardware complexity.
+
+Therefore they are their own subset extension.
+
# Synthesis, Pseudo-code ops and macro-ops
The pseudo-ops are best left up to the compiler rather than being actual
(loop invariant) set to "1.0" at the beginning of a function or other
suitable code block.
-* FSINCOS - fused macro-op between FSIN and FCOS (issued in that order).
-* FSINCOSPI - fused macro-op between FSINPI and FCOSPI (issued in that order).
+* fsincos - fused macro-op between fsin and fcos (issued in that order).
+* fsincospi - fused macro-op between fsinpi and fcospi (issued in that order).
-FATANPI example pseudo-code:
+fatanpi example pseudo-code:
- fmvis ft0, 0x3F800 // upper bits of f32 1.0 (BF16)
+ fmvis ft0, 0x3F80 // upper bits of f32 1.0 (BF16)
fatan2pis FRT, FRA, ft0
Hyperbolic function example (obviates need for Zfhyp except for
ASINH( x ) = ln( x + SQRT(x**2+1))
+`pow` sufficient for 3D Graphics:
+
+ pow(b, x) = exp2(x * log2(b))
+
# Evaluation and commentary
Moved to [[discussion]]