X-Git-Url: https://git.libre-soc.org/?a=blobdiff_plain;f=openpower%2Ftranscendentals.mdwn;h=9e6b86118b1f2687e3a14d18851165041faa155b;hb=7881bad4de0c9b5abc3fe7db2ae65181f7369bd4;hp=4d89ad7b23610263f1253f8bfc97201983e77bfb;hpb=b59f1b7f8cd6540716110dfbe355ccfc2f2638fe;p=libreriscv.git diff --git a/openpower/transcendentals.mdwn b/openpower/transcendentals.mdwn index 4d89ad7b2..9e6b86118 100644 --- a/openpower/transcendentals.mdwn +++ b/openpower/transcendentals.mdwn @@ -7,8 +7,9 @@ add IEEE754 transcendental functions (pow, log etc) and trigonometric functions (sin, cos etc). These functions are also 98% shared with the Khronos Group OpenCL Extended Instruction Set.* -With thanks to: +Authors/Contributors: +* Luke Kenneth Casson Leighton * Jacob Lifshay * Dan Petroski * Mitch Alsup @@ -21,6 +22,8 @@ With thanks to: See: * +* transcendentals in simulator +* under review * * [[power_trans_ops]] for opcode listing. @@ -39,6 +42,7 @@ TODO: rename extension subsets -- we're not on RISC-V anymore. acosh, atanh (can be synthesised - see below) * **ZftransAdv**: much more complex to implement in hardware * **Zfrsqrt**: Reciprocal square-root. +* **Zfminmax**: Min/Max. Minimum recommended requirements for 3D: Zftrans, Ztrignpi, Zarctrignpi, with Ztrigpi and Zarctrigpi as augmentations. @@ -132,52 +136,81 @@ IEEE754-2019 Table 9.1 lists "additional mathematical operations". Interestingly the only functions missing when compared to OpenCL are compound, exp2m1, exp10m1, log2p1, log10p1, pown (integer power) and powr. -|opcode |OpenCL FP32|OpenCL FP16|OpenCL native|OpenCL fast|IEEE754 |Power ISA | -|------- |-----------|-----------|-------------|-----------|------- |--------- | -|FSIN |sin |half\_sin |native\_sin |NONE |sin |NONE | -|FCOS |cos |half\_cos |native\_cos |NONE |cos |NONE | -|FTAN |tan |half\_tan |native\_tan |NONE |tan |NONE | -|NONE (1)|sincos |NONE |NONE |NONE |NONE |NONE | -|FASIN |asin |NONE |NONE |NONE |asin |NONE | -|FACOS |acos |NONE |NONE |NONE |acos |NONE | -|FATAN |atan |NONE |NONE |NONE |atan |NONE | -|FSINPI |sinpi |NONE |NONE |NONE |sinPi |NONE | -|FCOSPI |cospi |NONE |NONE |NONE |cosPi |NONE | -|FTANPI |tanpi |NONE |NONE |NONE |tanPi |NONE | -|FASINPI |asinpi |NONE |NONE |NONE |asinPi |NONE | -|FACOSPI |acospi |NONE |NONE |NONE |acosPi |NONE | -|FATANPI |atanpi |NONE |NONE |NONE |atanPi |NONE | -|FSINH |sinh |NONE |NONE |NONE |sinh |NONE | -|FCOSH |cosh |NONE |NONE |NONE |cosh |NONE | -|FTANH |tanh |NONE |NONE |NONE |tanh |NONE | -|FASINH |asinh |NONE |NONE |NONE |asinh |NONE | -|FACOSH |acosh |NONE |NONE |NONE |acosh |NONE | -|FATANH |atanh |NONE |NONE |NONE |atanh |NONE | -|FATAN2 |atan2 |NONE |NONE |NONE |atan2 |NONE | -|FATAN2PI|atan2pi |NONE |NONE |NONE |atan2pi |NONE | -|FRSQRT |rsqrt |half\_rsqrt|native\_rsqrt|NONE |rSqrt |fsqrte, fsqrtes (4) | -|FCBRT |cbrt |NONE |NONE |NONE |NONE (2)|NONE | -|FEXP2 |exp2 |half\_exp2 |native\_exp2 |NONE |exp2 |NONE | -|FLOG2 |log2 |half\_log2 |native\_log2 |NONE |log2 |NONE | -|FEXPM1 |expm1 |NONE |NONE |NONE |expm1 |NONE | -|FLOG1P |log1p |NONE |NONE |NONE |logp1 |NONE | -|FEXP |exp |half\_exp |native\_exp |NONE |exp |NONE | -|FLOG |log |half\_log |native\_log |NONE |log |NONE | -|FEXP10 |exp10 |half\_exp10|native\_exp10|NONE |exp10 |NONE | -|FLOG10 |log10 |half\_log10|native\_log10|NONE |log10 |NONE | -|FPOW |pow |NONE |NONE |NONE |pow |NONE | -|FPOWN |pown |NONE |NONE |NONE |pown |NONE | -|FPOWR |powr |half\_powr |native\_powr |NONE |powr |NONE | -|FROOTN |rootn |NONE |NONE |NONE |rootn |NONE | -|FHYPOT |hypot |NONE |NONE |NONE |hypot |NONE | -|FRECIP |NONE |half\_recip|native\_recip|NONE |NONE (3)|fre, fres (4) | -|NONE |NONE |NONE |NONE |NONE |compound|NONE | -|FEXP2M1 |NONE |NONE |NONE |NONE |exp2m1 |NONE | -|FEXP10M1|NONE |NONE |NONE |NONE |exp10m1 |NONE | -|FLOG2P1 |NONE |NONE |NONE |NONE |log2p1 |NONE | -|FLOG10P1|NONE |NONE |NONE |NONE |log10p1 |NONE | - -Note (1) FSINCOS is macro-op fused (see below). +|opcode |OpenCL FP32|OpenCL FP16|OpenCL native|IEEE754 |Power ISA |My 66000 ISA | +|------------|-----------|-----------|-------------|-------------- |------------------------|-------------| +|fsin |sin |half\_sin |native\_sin |sin |NONE |sin | +|fcos |cos |half\_cos |native\_cos |cos |NONE |cos | +|ftan |tan |half\_tan |native\_tan |tan |NONE |tan | +|NONE (1) |sincos |NONE |NONE |NONE |NONE | | +|fasin |asin |NONE |NONE |asin |NONE |asin | +|facos |acos |NONE |NONE |acos |NONE |acos | +|fatan |atan |NONE |NONE |atan |NONE |atan | +|fsinpi |sinpi |NONE |NONE |sinPi |NONE |sinpi | +|fcospi |cospi |NONE |NONE |cosPi |NONE |cospi | +|ftanpi |tanpi |NONE |NONE |tanPi |NONE |tanpi | +|fasinpi |asinpi |NONE |NONE |asinPi |NONE |asinpi | +|facospi |acospi |NONE |NONE |acosPi |NONE |acospi | +|fatanpi |atanpi |NONE |NONE |atanPi |NONE |atanpi | +|fsinh |sinh |NONE |NONE |sinh |NONE | | +|fcosh |cosh |NONE |NONE |cosh |NONE | | +|ftanh |tanh |NONE |NONE |tanh |NONE | | +|fasinh |asinh |NONE |NONE |asinh |NONE | | +|facosh |acosh |NONE |NONE |acosh |NONE | | +|fatanh |atanh |NONE |NONE |atanh |NONE | | +|fatan2 |atan2 |NONE |NONE |atan2 |NONE |atan2 | +|fatan2pi |atan2pi |NONE |NONE |atan2pi |NONE |atan2pi | +|frsqrt |rsqrt |half\_rsqrt|native\_rsqrt|rSqrt |fsqrte, fsqrtes (4) |rsqrt | +|fcbrt |cbrt |NONE |NONE |NONE (2) |NONE | | +|fexp2 |exp2 |half\_exp2 |native\_exp2 |exp2 |NONE |exp2 | +|flog2 |log2 |half\_log2 |native\_log2 |log2 |NONE |ln2 | +|fexpm1 |expm1 |NONE |NONE |expm1 |NONE |expm1 | +|flog1p |log1p |NONE |NONE |logp1 |NONE |logp1 | +|fexp |exp |half\_exp |native\_exp |exp |NONE |exp | +|flog |log |half\_log |native\_log |log |NONE |ln | +|fexp10 |exp10 |half\_exp10|native\_exp10|exp10 |NONE |exp10 | +|flog10 |log10 |half\_log10|native\_log10|log10 |NONE |log | +|fpow |pow |NONE |NONE |pow |NONE |pow | +|fpown |pown |NONE |NONE |pown |NONE | | +|fpowr |powr |half\_powr |native\_powr |powr |NONE | | +|frootn |rootn |NONE |NONE |rootn |NONE | | +|fhypot |hypot |NONE |NONE |hypot |NONE | | +|frecip |NONE |half\_recip|native\_recip|NONE (3) |fre, fres (4) |rcp | +|NONE |NONE |NONE |NONE |compound |NONE | | +|fexp2m1 |NONE |NONE |NONE |exp2m1 |NONE |exp2m1 | +|fexp10m1 |NONE |NONE |NONE |exp10m1 |NONE |exp10m1 | +|flog2p1 |NONE |NONE |NONE |log2p1 |NONE |ln2p1 | +|flog10p1 |NONE |NONE |NONE |log10p1 |NONE |logp1 | +|fminnum08 |fmin |fmin |NONE |minNum |xsmindp (5) | | +|fmaxnum08 |fmax |fmax |NONE |maxNum |xsmaxdp (5) | | +|fmin19 |fmin |fmin |NONE |minimum |NONE |fmin | +|fmax19 |fmax |fmax |NONE |maximum |NONE |fmax | +|fminnum19 |fmin |fmin |NONE |minimumNumber |vminfp (6), xsminjdp (5)| | +|fmaxnum19 |fmax |fmax |NONE |maximumNumber |vmaxfp (6), xsmaxjdp (5)| | +|fminc |fmin |fmin |NONE |NONE |xsmincdp (5) |fmin* | +|fmaxc |fmax |fmax |NONE |NONE |xsmaxcdp (5) |fmax* | +|fminmagnum08|minmag |minmag |NONE |minNumMag |NONE | | +|fmaxmagnum08|maxmag |maxmag |NONE |maxNumMag |NONE | | +|fminmag19 |minmag |minmag |NONE |minimumMagnitude |NONE | | +|fmaxmag19 |maxmag |maxmag |NONE |maximumMagnitude |NONE | | +|fminmagnum19|minmag |minmag |NONE |minimumMagnitudeNumber|NONE | | +|fmaxmagnum19|maxmag |maxmag |NONE |maximumMagnitudeNumber|NONE | | +|fminmagc |minmag |minmag |NONE |NONE |NONE | | +|fmaxmagc |maxmag |maxmag |NONE |NONE |NONE | | +|fmod |fmod |fmod | |NONE |NONE | | +|fremainder |remainder |remainder | |remainder |NONE | | + + from Mitch Alsup: + +* Brian's LLVM compiler converts fminc and fmaxc into fmin and fmax instructions +These are all IEEE 754-2019 compliant +These are native instructions not extensions +All listed functions are available in both F32 and F64 formats. +THere is some confusion (in my head) abouot fmin and fmax. I intend both instruction to perform 754-2019 semantics-- +but I don know if this is minimum/maximum or minimumNumber/maximumNumber. +fmad and remainder are a 2-instruction sequence--don't know how to "edit it in" + + +Note (1) fsincos is macro-op fused (see below). Note (2) synthesised in IEEE754-2019 as "rootn(x, 3)" @@ -186,60 +219,100 @@ Note (3) synthesised in IEEE754-2019 using "1.0 / x" Note (4) these are estimate opcodes that help accelerate software emulation +Note (5) f64-only (though can be used on f32 stored in f64 format), requires VSX. + +Note (6) 4xf32-only, requires VMX. + ## List of 2-arg opcodes -| opcode | Description | pseudocode | Extension | -| ------ | ---------------- | ---------------- | ----------- | -| FATAN2 | atan2 arc tangent | FRT = atan2(FRB, FRA) | Zarctrignpi | -| FATAN2PI | atan2 arc tangent / pi | FRT = atan2(FRB, FRA) / pi | Zarctrigpi | -| FPOW | x power of y | FRT = pow(FRA, FRB) | ZftransAdv | -| FPOWN | x power of n (n int) | FRT = pow(FRA, RB) | ZftransAdv | -| FPOWR | x power of y (x +ve) | FRT = exp(FRA log(FRB)) | ZftransAdv | -| FROOTN | x power 1/n (n integer)| FRT = pow(FRA, 1/RB) | ZftransAdv | -| FHYPOT | hypotenuse | FRT = sqrt(FRA^2 + FRB^2) | ZftransAdv | +| opcode | Description | pseudocode | Extension | +| ------ | ---------------- | ---------------- | ----------- | +| fatan2 | atan2 arc tangent | FRT = atan2(FRB, FRA) | Zarctrignpi | +| fatan2pi | atan2 arc tangent / pi | FRT = atan2(FRB, FRA) / pi | Zarctrigpi | +| fpow | x power of y | FRT = pow(FRA, FRB) | ZftransAdv | +| fpown | x power of n (n int) | FRT = pow(FRA, RB) | ZftransAdv | +| fpowr | x power of y (x +ve) | FRT = exp(FRA log(FRB)) | ZftransAdv | +| frootn | x power 1/n (n integer) | FRT = pow(FRA, 1/RB) | ZftransAdv | +| fhypot | hypotenuse | FRT = sqrt(FRA^2 + FRB^2) | ZftransAdv | +| fminnum08 | IEEE 754-2008 minNum | FRT = minNum(FRA, FRB) (1) | Zfminmax | +| fmaxnum08 | IEEE 754-2008 maxNum | FRT = maxNum(FRA, FRB) (1) | Zfminmax | +| fmin19 | IEEE 754-2019 minimum | FRT = minimum(FRA, FRB) | Zfminmax | +| fmax19 | IEEE 754-2019 maximum | FRT = maximum(FRA, FRB) | Zfminmax | +| fminnum19 | IEEE 754-2019 minimumNumber | FRT = minimumNumber(FRA, FRB) | Zfminmax | +| fmaxnum19 | IEEE 754-2019 maximumNumber | FRT = maximumNumber(FRA, FRB) | Zfminmax | +| fminc | C ternary-op minimum | FRT = FRA \< FRB ? FRA : FRB | Zfminmax | +| fmaxc | C ternary-op maximum | FRT = FRA > FRB ? FRA : FRB | Zfminmax | +| fminmagnum08 | IEEE 754-2008 minNumMag | FRT = minmaxmag(FRA, FRB, False, fminnum08) (2)| Zfminmax | +| fmaxmagnum08 | IEEE 754-2008 maxNumMag | FRT = minmaxmag(FRA, FRB, True, fmaxnum08) (2) | Zfminmax | +| fminmag19 | IEEE 754-2019 minimumMagnitude | FRT = minmaxmag(FRA, FRB, False, fmin19) (2) | Zfminmax | +| fmaxmag19 | IEEE 754-2019 maximumMagnitude | FRT = minmaxmag(FRA, FRB, True, fmax19) (2) | Zfminmax | +| fminmagnum19 | IEEE 754-2019 minimumMagnitudeNumber | FRT = minmaxmag(FRA, FRB, False, fminnum19) (2)| Zfminmax | +| fmaxmagnum19 | IEEE 754-2019 maximumMagnitudeNumber | FRT = minmaxmag(FRA, FRB, True, fmaxnum19) (2) | Zfminmax | +| fminmagc | C ternary-op minimum magnitude | FRT = minmaxmag(FRA, FRB, False, fminc) (2) | Zfminmax | +| fmaxmagc | C ternary-op maximum magnitude | FRT = minmaxmag(FRA, FRB, True, fmaxc) (2) | Zfminmax | +| fmod | modulus | FRT = fmod(FRA, FRB) | ZftransExt | +| fremainder | IEEE 754 remainder | FRT = remainder(FRA, FRB) | ZftransExt | + +Note (1): for the purposes of minNum/maxNum, -0.0 is defined to be less than +0.0. This is left unspecified in IEEE 754-2008. + +Note (2): minmaxmag(x, y, cmp, fallback) is defined as: + +```python +def minmaxmag(x, y, is_max, fallback): + a = abs(x) < abs(y) + b = abs(x) > abs(y) + if is_max: + a, b = b, a # swap + if a: + return x + if b: + return y + # equal magnitudes, or NaN input(s) + return fallback(x, y) +``` ## List of 1-arg transcendental opcodes | opcode | Description | pseudocode | Extension | | ------ | ---------------- | ---------------- | ---------- | -| FRSQRT | Reciprocal Square-root | FRT = sqrt(FRA) | Zfrsqrt | -| FCBRT | Cube Root | FRT = pow(FRA, 1.0 / 3) | ZftransAdv | -| FRECIP | Reciprocal | FRT = 1.0 / FRA | Zftrans | -| FEXP2M1 | power-2 minus 1 | FRT = pow(2, FRA) - 1.0 | ZftransExt | -| FLOG2P1 | log2 plus 1 | FRT = log(2, 1 + FRA) | ZftransExt | -| FEXP2 | power-of-2 | FRT = pow(2, FRA) | Zftrans | -| FLOG2 | log2 | FRT = log(2. FRA) | Zftrans | -| FEXPM1 | exponential minus 1 | FRT = pow(e, FRA) - 1.0 | ZftransExt | -| FLOG1P | log plus 1 | FRT = log(e, 1 + FRA) | ZftransExt | -| FEXP | exponential | FRT = pow(e, FRA) | ZftransExt | -| FLOG | natural log (base e) | FRT = log(e, FRA) | ZftransExt | -| FEXP10M1 | power-10 minus 1 | FRT = pow(10, FRA) - 1.0 | ZftransExt | -| FLOG10P1 | log10 plus 1 | FRT = log(10, 1 + FRA) | ZftransExt | -| FEXP10 | power-of-10 | FRT = pow(10, FRA) | ZftransExt | -| FLOG10 | log base 10 | FRT = log(10, FRA) | ZftransExt | +| frsqrt | Reciprocal Square-root | FRT = sqrt(FRA) | Zfrsqrt | +| fcbrt | Cube Root | FRT = pow(FRA, 1.0 / 3) | ZftransAdv | +| frecip | Reciprocal | FRT = 1.0 / FRA | Zftrans | +| fexp2m1 | power-2 minus 1 | FRT = pow(2, FRA) - 1.0 | ZftransExt | +| flog2p1 | log2 plus 1 | FRT = log(2, 1 + FRA) | ZftransExt | +| fexp2 | power-of-2 | FRT = pow(2, FRA) | Zftrans | +| flog2 | log2 | FRT = log(2. FRA) | Zftrans | +| fexpm1 | exponential minus 1 | FRT = pow(e, FRA) - 1.0 | ZftransExt | +| flog1p | log plus 1 | FRT = log(e, 1 + FRA) | ZftransExt | +| fexp | exponential | FRT = pow(e, FRA) | ZftransExt | +| flog | natural log (base e) | FRT = log(e, FRA) | ZftransExt | +| fexp10m1 | power-10 minus 1 | FRT = pow(10, FRA) - 1.0 | ZftransExt | +| flog10p1 | log10 plus 1 | FRT = log(10, 1 + FRA) | ZftransExt | +| fexp10 | power-of-10 | FRT = pow(10, FRA) | ZftransExt | +| flog10 | log base 10 | FRT = log(10, FRA) | ZftransExt | ## List of 1-arg trigonometric opcodes | opcode | Description | pseudocode | Extension | | -------- | ------------------------ | ------------------------ | ----------- | -| FSIN | sin (radians) | FRT = sin(FRA) | Ztrignpi | -| FCOS | cos (radians) | FRT = cos(FRA) | Ztrignpi | -| FTAN | tan (radians) | FRT = tan(FRA) | Ztrignpi | -| FASIN | arcsin (radians) | FRT = asin(FRA) | Zarctrignpi | -| FACOS | arccos (radians) | FRT = acos(FRA) | Zarctrignpi | -| FATAN | arctan (radians) | FRT = atan(FRA) | Zarctrignpi | -| FSINPI | sin times pi | FRT = sin(pi * FRA) | Ztrigpi | -| FCOSPI | cos times pi | FRT = cos(pi * FRA) | Ztrigpi | -| FTANPI | tan times pi | FRT = tan(pi * FRA) | Ztrigpi | -| FASINPI | arcsin / pi | FRT = asin(FRA) / pi | Zarctrigpi | -| FACOSPI | arccos / pi | FRT = acos(FRA) / pi | Zarctrigpi | -| FATANPI | arctan / pi | FRT = atan(FRA) / pi | Zarctrigpi | -| FSINH | hyperbolic sin (radians) | FRT = sinh(FRA) | Zfhyp | -| FCOSH | hyperbolic cos (radians) | FRT = cosh(FRA) | Zfhyp | -| FTANH | hyperbolic tan (radians) | FRT = tanh(FRA) | Zfhyp | -| FASINH | inverse hyperbolic sin | FRT = asinh(FRA) | Zfhyp | -| FACOSH | inverse hyperbolic cos | FRT = acosh(FRA) | Zfhyp | -| FATANH | inverse hyperbolic tan | FRT = atanh(FRA) | Zfhyp | +| fsin | sin (radians) | FRT = sin(FRA) | Ztrignpi | +| fcos | cos (radians) | FRT = cos(FRA) | Ztrignpi | +| ftan | tan (radians) | FRT = tan(FRA) | Ztrignpi | +| fasin | arcsin (radians) | FRT = asin(FRA) | Zarctrignpi | +| facos | arccos (radians) | FRT = acos(FRA) | Zarctrignpi | +| fatan | arctan (radians) | FRT = atan(FRA) | Zarctrignpi | +| fsinpi | sin times pi | FRT = sin(pi * FRA) | Ztrigpi | +| fcospi | cos times pi | FRT = cos(pi * FRA) | Ztrigpi | +| ftanpi | tan times pi | FRT = tan(pi * FRA) | Ztrigpi | +| fasinpi | arcsin / pi | FRT = asin(FRA) / pi | Zarctrigpi | +| facospi | arccos / pi | FRT = acos(FRA) / pi | Zarctrigpi | +| fatanpi | arctan / pi | FRT = atan(FRA) / pi | Zarctrigpi | +| fsinh | hyperbolic sin (radians) | FRT = sinh(FRA) | Zfhyp | +| fcosh | hyperbolic cos (radians) | FRT = cosh(FRA) | Zfhyp | +| ftanh | hyperbolic tan (radians) | FRT = tanh(FRA) | Zfhyp | +| fasinh | inverse hyperbolic sin | FRT = asinh(FRA) | Zfhyp | +| facosh | inverse hyperbolic cos | FRT = acosh(FRA) | Zfhyp | +| fatanh | inverse hyperbolic tan | FRT = atanh(FRA) | Zfhyp | [[!inline pages="openpower/power_trans_ops" raw=yes ]] @@ -255,6 +328,8 @@ the less common subsets are still required for IEEE754 HPC. MALI Midgard, an embedded / mobile 3D GPU, for example only has the following opcodes: + 28 - fmin + 2C - fmax E8 - fatan_pt2 F0 - frcp (reciprocal) F2 - frsqrt (inverse square root, 1/sqrt(x)) @@ -271,6 +346,7 @@ Vivante Embedded/Mobile 3D (etnaviv ) only has the following: + fmin/fmax (implemented using SELECT) sin, cos2pi cos, sin2pi log2, exp @@ -282,6 +358,7 @@ It also has fast variants of some of these, as a CSR Mode. AMD's R600 GPU (R600\_Instruction\_Set\_Architecture.pdf) and the RDNA ISA (RDNA\_Shader\_ISA\_5August2019.pdf, Table 22, Section 6.3) have: + MIN/MAX/MIN_DX10/MAX_DX10 COS2PI (appx) EXP2 LOG (IEEE754) @@ -291,7 +368,7 @@ RDNA ISA (RDNA\_Shader\_ISA\_5August2019.pdf, Table 22, Section 6.3) have: SIN2PI (appx) AMD RDNA has F16 and F32 variants of all the above, and also has F64 -variants of SQRT, RSQRT and RECIP. It is interesting that even the +variants of SQRT, RSQRT, MIN, MAX, and RECIP. It is interesting that even the modern high-end AMD GPU does not have TAN or ATAN, where MALI Midgard does. @@ -320,7 +397,7 @@ They are therefore considered "base" (essential) transcendentals. ### ZftransExt -LOG, EXP, EXP10, LOG10, LOGP1, EXP1M +LOG, EXP, EXP10, LOG10, LOGP1, EXP1M, fmod, fremainder These are extra transcendental functions that are useful, not generally needed for 3D, however for Numerical Computation they may be useful. @@ -359,6 +436,11 @@ CBRT, POW, POWN, POWR, ROOTN These are simply much more complex to implement in hardware, and typically will only be put into HPC applications. +Note that `pow` is commonly used in Blinn-Phong shading (the shading model used +by OpenGL 1.0 and commonly used by shader authors that need basic 3D graphics +with specular highlights), however it can be sufficiently emulated using +`pow(b, n) = exp2(n*log2(b))`. + * **Zfrsqrt**: Reciprocal square-root. ## Trigonometric subsets @@ -372,6 +454,11 @@ Ztrignpi are the basic trigonometric functions through which all others could be synthesised, and they are typically the base trigonometrics provided by GPUs for 3D, warranting their own subset. +(programmerjake: actually, all other GPU ISAs mentioned in this document have sinpi/cospi or equivalent, and often not sin/cos, because sinpi/cospi are actually *waay* easier to implement because range reduction is simply a bitwise mask, whereas for sin/cos range reduction is a full division by pi) + +(Mitch: My patent USPTO 10,761,806 shows that the above statement is no longer true.) + + In the case of the Ztrigpi subset, these are commonly used in for loops with a power of two number of subdivisions, and the cost of multiplying by PI inside each loop (or cumulative addition, resulting in cumulative @@ -405,6 +492,28 @@ is acceptable for 3D. Therefore they are their own subset extensions. +### Zfminmax + +* fminnum08 fmaxnum08 +* fmin19 fmax19 +* fminnum19 fmaxnum19 +* fminc fmaxc +* fminmagnum08 fmaxmagnum08 +* fminmag19 fmaxmag19 +* fminmagnum19 fmaxmagnum19 +* fminmagc fmaxmagc + +These are commonly used for vector reductions, where having them be a single +instruction is critical. They are also commonly used in GPU shaders, HPC, and +general-purpose FP algorithms. + +These min and max operations are quite cheap to implement hardware-wise, +being comparable in cost to fcmp + some muxes. They're all in one extension +because once you implement some of them, the rest require only slightly more +hardware complexity. + +Therefore they are their own subset extension. + # Synthesis, Pseudo-code ops and macro-ops The pseudo-ops are best left up to the compiler rather than being actual @@ -412,12 +521,12 @@ pseudo-ops, by allocating one scalar FP register for use as a constant (loop invariant) set to "1.0" at the beginning of a function or other suitable code block. -* FSINCOS - fused macro-op between FSIN and FCOS (issued in that order). -* FSINCOSPI - fused macro-op between FSINPI and FCOSPI (issued in that order). +* fsincos - fused macro-op between fsin and fcos (issued in that order). +* fsincospi - fused macro-op between fsinpi and fcospi (issued in that order). -FATANPI example pseudo-code: +fatanpi example pseudo-code: - fmvis ft0, 0x3F800 // upper bits of f32 1.0 (BF16) + fmvis ft0, 0x3F80 // upper bits of f32 1.0 (BF16) fatan2pis FRT, FRA, ft0 Hyperbolic function example (obviates need for Zfhyp except for @@ -425,6 +534,10 @@ high-performance or correctly-rounding): ASINH( x ) = ln( x + SQRT(x**2+1)) +`pow` sufficient for 3D Graphics: + + pow(b, x) = exp2(x * log2(b)) + # Evaluation and commentary Moved to [[discussion]]