-# Transcendental operations
+# DRAFT Scalar Transcendentals
Summary:
Minimum recommended requirements for Mobile-Embedded 3D:
Ztrignpi, Zftrans, with Ztrigpi as an augmentation.
+The Platform Requirements for 3D are driven by cost competitive
+factors and it is the Trademarked Vulkan Specification that provides
+clear direction for 3D GPU markets, but nothing else (IEEE754).
+Implementors must note that minimum
+Compliance with the Third Party Vulkan Specification (for power-area competitive
+reasons with other 3D GPU manufacturers) will not qualify for strict IEEE754 accuracy Compliance or vice-versa.
+
+Implementors **must** make it clear which accuracy level is implemented and provide a switching mechanism and throw Illegal Instruction traps if fully compliant accuracy cannot be achieved.
+It is also the Implementor's responsibility to comply with all Third Party Certification Marks and Trademarks (Vulkan, OpenCL). Nothing in this specification in any way implies that any Third Party Certification Mark Compliance is granted, nullified, altered or overridden by this document.
+
+
# TODO:
* Decision on accuracy, moved to [[zfpacc_proposal]]
and hardware cost reductions associated with common standards adoption
in Power ISA (primarily IEEE754 and Vulkan).
-**There are *four* different, disparate platform's needs (two new)**:
-
-* 3D Embedded Platform (new)
-* Embedded Platform
-* 3D UNIX Platform (new)
-* UNIX Platform
-
**The use-cases are**:
* 3D GPUs
* Ultra-low-power (smartwatches where GPU power budgets are in milliwatts)
* Mobile-Embedded (good performance with high efficiency for battery life)
* Desktop Computing
-* Server / HPC (2)
-
-(2) Supercomputing is left out of the requirements as it is traditionally
-covered by Supercomputer Vectorisation Standards (such as RVV).
+* Server / HPC / Supercomputing
**The software requirements are**:
seeking public Certification and Endorsement from the Khronos Group
under their Trademarked Certification Programme.
-**The "contra"-requirements are**:
-
- Ultra Low Power Embedded platforms (smart watches) are sufficiently
- resource constrained that Vectorisation (of any kind) is likely to be
- unnecessary and inappropriate.
-* The requirements are **not** for the purposes of developing a full custom
- proprietary GPU with proprietary firmware driven by *hardware* centric
- optimised design decisions as a priority over collaboration.
-* A full custom proprietary GPU ASIC Manufacturer *may* benefit from
- this proposal however the fact that they typically develop proprietary
- software that is not shared with the rest of the community likely to
- use this proposal means that they have completely different needs.
-* This proposal is for *sharing* of effort in reducing development costs
-
# Proposed Opcodes vs Khronos OpenCL vs IEEE754-2019<a name="khronos_equiv"></a>
This list shows the (direct) equivalence between proposed opcodes,
|FHYPOT |hypot |NONE |NONE |NONE |hypot |NONE |
|FRECIP |NONE |half\_recip|native\_recip|NONE |NONE (3)|fre, fres (4) |
|NONE |NONE |NONE |NONE |NONE |compound|NONE |
-|NONE |NONE |NONE |NONE |NONE |exp2m1 |NONE |
-|NONE |NONE |NONE |NONE |NONE |exp10m1 |NONE |
-|NONE |NONE |NONE |NONE |NONE |log2p1 |NONE |
-|NONE |NONE |NONE |NONE |NONE |log10p1 |NONE |
+|FEXP2M1 |NONE |NONE |NONE |NONE |exp2m1 |NONE |
+|FEXP10M1 |NONE |NONE |NONE |NONE |exp10m1 |NONE |
+|FLOG2P1 |NONE |NONE |NONE |NONE |log2p1 |NONE |
+|FLOG10P1 |NONE |NONE |NONE |NONE |log10p1 |NONE |
Note (1) FSINCOS is macro-op fused (see below).
| opcode | Description | pseudocode | Extension |
| ------ | ---------------- | ---------------- | ----------- |
-| FATAN2 | atan2 arc tangent | rd = atan2(rs2, rs1) | Zarctrignpi |
-| FATAN2PI | atan2 arc tangent / pi | rd = atan2(rs2, rs1) / pi | Zarctrigpi |
-| FPOW | x power of y | rd = pow(rs1, rs2) | ZftransAdv |
-| FPOWN | x power of n (n int) | rd = pow(rs1, rs2) | ZftransAdv |
-| FPOWR | x power of y (x +ve) | rd = exp(rs1 log(rs2)) | ZftransAdv |
-| FROOTN | x power 1/n (n integer)| rd = pow(rs1, 1/rs2) | ZftransAdv |
-| FHYPOT | hypotenuse | rd = sqrt(rs1^2 + rs2^2) | ZftransAdv |
+| FATAN2 | atan2 arc tangent | FRT = atan2(FRB, FRA) | Zarctrignpi |
+| FATAN2PI | atan2 arc tangent / pi | FRT = atan2(FRB, FRA) / pi | Zarctrigpi |
+| FPOW | x power of y | FRT = pow(FRA, FRB) | ZftransAdv |
+| FPOWN | x power of n (n int) | FRT = pow(FRA, RB) | ZftransAdv |
+| FPOWR | x power of y (x +ve) | FRT = exp(FRA log(FRB)) | ZftransAdv |
+| FROOTN | x power 1/n (n integer)| FRT = pow(FRA, 1/RB) | ZftransAdv |
+| FHYPOT | hypotenuse | FRT = sqrt(FRA^2 + FRB^2) | ZftransAdv |
## List of 1-arg transcendental opcodes
| opcode | Description | pseudocode | Extension |
| ------ | ---------------- | ---------------- | ----------- |
-| FRSQRT | Reciprocal Square-root | rd = sqrt(rs1) | Zfrsqrt |
-| FCBRT | Cube Root | rd = pow(rs1, 1.0 / 3) | ZftransAdv |
-| FRECIP | Reciprocal | rd = 1.0 / rs1 | Zftrans |
-| FEXP2 | power-of-2 | rd = pow(2, rs1) | Zftrans |
-| FLOG2 | log2 | rd = log(2. rs1) | Zftrans |
-| FEXPM1 | exponential minus 1 | rd = pow(e, rs1) - 1.0 | ZftransExt |
-| FLOG1P | log plus 1 | rd = log(e, 1 + rs1) | ZftransExt |
-| FEXP | exponential | rd = pow(e, rs1) | ZftransExt |
-| FLOG | natural log (base e) | rd = log(e, rs1) | ZftransExt |
-| FEXP10 | power-of-10 | rd = pow(10, rs1) | ZftransExt |
-| FLOG10 | log base 10 | rd = log(10, rs1) | ZftransExt |
+| FRSQRT | Reciprocal Square-root | FRT = sqrt(FRA) | Zfrsqrt |
+| FCBRT | Cube Root | FRT = pow(FRA, 1.0 / 3) | ZftransAdv |
+| FRECIP | Reciprocal | FRT = 1.0 / FRA | Zftrans |
+| FEXP2M1 | power-2 minus 1 | FRT = pow(2, FRA) - 1.0 | ZftransExt |
+| FLOG2P1 | log2 plus 1 | FRT = log(2, 1 + FRA) | ZftransExt |
+| FEXP2 | power-of-2 | FRT = pow(2, FRA) | Zftrans |
+| FLOG2 | log2 | FRT = log(2. FRA) | Zftrans |
+| FEXPM1 | exponential minus 1 | FRT = pow(e, FRA) - 1.0 | ZftransExt |
+| FLOG1P | log plus 1 | FRT = log(e, 1 + FRA) | ZftransExt |
+| FEXP | exponential | FRT = pow(e, FRA) | ZftransExt |
+| FLOG | natural log (base e) | FRT = log(e, FRA) | ZftransExt |
+| FEXP10M1 | power-10 minus 1 | FRT = pow(10, FRA) - 1.0 | ZftransExt |
+| FLOG10P1 | log10 plus 1 | FRT = log(10, 1 + FRA) | ZftransExt |
+| FEXP10 | power-of-10 | FRT = pow(10, FRA) | ZftransExt |
+| FLOG10 | log base 10 | FRT = log(10, FRA) | ZftransExt |
## List of 1-arg trigonometric opcodes
-| opcode | Description | pseudocode | Extension |
-| ------ | ---------------- | ---------------- | ----------- |
-| FSIN | sin (radians) | rd = sin(rs1) | Ztrignpi |
-| FCOS | cos (radians) | rd = cos(rs1) | Ztrignpi |
-| FTAN | tan (radians) | rd = tan(rs1) | Ztrignpi |
-| FASIN | arcsin (radians) | rd = asin(rs1) | Zarctrignpi |
-| FACOS | arccos (radians) | rd = acos(rs1) | Zarctrignpi |
-| FATAN | arctan (radians) | rd = atan(rs1) | Zarctrignpi |
-| FSINPI | sin times pi | rd = sin(pi * rs1) | Ztrigpi |
-| FCOSPI | cos times pi | rd = cos(pi * rs1) | Ztrigpi |
-| FTANPI | tan times pi | rd = tan(pi * rs1) | Ztrigpi |
-| FASINPI | arcsin / pi | rd = asin(rs1) / pi | Zarctrigpi |
-| FACOSPI | arccos / pi | rd = acos(rs1) / pi | Zarctrigpi |
-| FATANPI | arctan / pi | rd = atan(rs1) / pi | Zarctrigpi |
-| FSINH | hyperbolic sin (radians) | rd = sinh(rs1) | Zfhyp |
-| FCOSH | hyperbolic cos (radians) | rd = cosh(rs1) | Zfhyp |
-| FTANH | hyperbolic tan (radians) | rd = tanh(rs1) | Zfhyp |
-| FASINH | inverse hyperbolic sin | rd = asinh(rs1) | Zfhyp |
-| FACOSH | inverse hyperbolic cos | rd = acosh(rs1) | Zfhyp |
-| FATANH | inverse hyperbolic tan | rd = atanh(rs1) | Zfhyp |
+| opcode | Description | pseudocode | Extension |
+| -------- | ------------------------ | ------------------------ | ----------- |
+| FSIN | sin (radians) | FRT = sin(FRA) | Ztrignpi |
+| FCOS | cos (radians) | FRT = cos(FRA) | Ztrignpi |
+| FTAN | tan (radians) | FRT = tan(FRA) | Ztrignpi |
+| FASIN | arcsin (radians) | FRT = asin(FRA) | Zarctrignpi |
+| FACOS | arccos (radians) | FRT = acos(FRA) | Zarctrignpi |
+| FATAN | arctan (radians) | FRT = atan(FRA) | Zarctrignpi |
+| FSINPI | sin times pi | FRT = sin(pi * FRA) | Ztrigpi |
+| FCOSPI | cos times pi | FRT = cos(pi * FRA) | Ztrigpi |
+| FTANPI | tan times pi | FRT = tan(pi * FRA) | Ztrigpi |
+| FASINPI | arcsin / pi | FRT = asin(FRA) / pi | Zarctrigpi |
+| FACOSPI | arccos / pi | FRT = acos(FRA) / pi | Zarctrigpi |
+| FATANPI | arctan / pi | FRT = atan(FRA) / pi | Zarctrigpi |
+| FSINH | hyperbolic sin (radians) | FRT = sinh(FRA) | Zfhyp |
+| FCOSH | hyperbolic cos (radians) | FRT = cosh(FRA) | Zfhyp |
+| FTANH | hyperbolic tan (radians) | FRT = tanh(FRA) | Zfhyp |
+| FASINH | inverse hyperbolic sin | FRT = asinh(FRA) | Zfhyp |
+| FACOSH | inverse hyperbolic cos | FRT = acosh(FRA) | Zfhyp |
+| FATANH | inverse hyperbolic tan | FRT = atanh(FRA) | Zfhyp |
[[!inline pages="openpower/power_trans_ops" raw=yes ]]
Although they can be synthesised using Ztrans (LOG2 multiplied
by a constant), there is both a performance penalty as well as an
accuracy penalty towards the limits, which for IEEE754 compliance is
-unacceptable. In particular, LOG(1+rs1) in hardware may give much better
-accuracy at the lower end (very small rs1) than LOG(rs1).
+unacceptable. In particular, LOG(1+FRA) in hardware may give much better
+accuracy at the lower end (very small FRA) than LOG(FRA).
Their forced inclusion would be inappropriate as it would penalise
embedded systems with tight power and area budgets. However if they