# Zftrans - transcendental operations
+Summary:
+
+*This proposal extends RISC-V scalar floating point operations to add IEEE754 transcendental functions (pow, log etc) and trigonometric functions (sin, cos etc). These functions are also 98% shared with the Khronos Group OpenCL Extended Instruction Set.*
+
With thanks to:
* Jacob Lifshay
* **ZftransAdv**: much more complex to implement in hardware
* **Zfrsqrt**: Reciprocal square-root.
-Minimum recommended requirements for 3D: Zftrans, Ztrigpi, Ztrignpi, Zarctrigpi,
-Zarctrignpi
+Minimum recommended requirements for 3D: Zftrans, Ztrignpi,
+Zarctrignpi, with Ztrigpi and Zarctrigpi as augmentations.
-Minimum recommended requirements for Mobile-Embedded 3D: Ztrigpi, Zftrans, Ztrignpi
+Minimum recommended requirements for Mobile-Embedded 3D: Ztrignpi, Zftrans, with Ztrigpi as an augmentation.
# TODO:
This proposal is designed to meet a wide range of extremely diverse needs,
allowing implementors from all of them to benefit from the tools and hardware
-cost reductions associated with common standards adoption.
+cost reductions associated with common standards adoption in RISC-V (primarily IEEE754 and Vulkan).
**There are *four* different, disparate platform's needs (two new)**:
may not be sold and also make a claim of being, for example, "Vulkan
compatible".
-This in turn reinforces and makes a hard requirement a need for public
+For 3D, this in turn reinforces and makes a hard requirement a need for public
compliance with such standards, over-and-above what would otherwise be
set by a RISC-V Standards Development Process, including both the
software compliance and the knock-on implications that has for hardware.
+For libraries such as libm and numpy, accuracy is paramount, for software interoperability across multiple platforms. Some algorithms critically rely on correct IEEE754, for example.
+The conflicting accuracy requirements can be met through the zfpacc extension.
+
**Collaboration**:
The case for collaboration on any Extension is already well-known.
with huge (non-uniform) market diversity even with similarly large
numbers of potential opcodes. BitManip is the perfect counter-example.
-# Proposed Opcodes vs Khronos OpenCL Opcodes <a name="khronos_equiv"></a>
+# Proposed Opcodes vs Khronos OpenCL vs IEEE754-2019<a name="khronos_equiv"></a>
+
+This list shows the (direct) equivalence between proposed opcodes,
+their Khronos OpenCL equivalents, and their IEEE754-2019 equivalents.
+98% of the opcodes in this proposal that are in the IEEE754-2019 standard
+are present in the Khronos Extended Instruction Set.
-This list shows the (direct) equivalence between proposed opcodes and
-their Khronos OpenCL equivalents.
For RISCV opcode encodings see
[[rv_major_opcode_1010011]]
See
<https://www.khronos.org/registry/spir-v/specs/unified1/OpenCL.ExtendedInstructionSet.100.html>
+and <https://ieeexplore.ieee.org/document/8766229>
* Special FP16 opcodes are *not* being proposed, except by indirect / inherent
use of the "fmt" field that is already present in the RISC-V Specification.
results in non-compliance, and the vendor may not use the Trademarked words
"Vulkan" etc. in conjunction with their product.
+IEEE754-2019 Table 9.1 lists "additional mathematical operations".
+Interestingly the only functions missing when compared to OpenCL are
+compound, exp2m1, exp10m1, log2p1, log10p1, pown (integer power) and powr.
+
[[!table data="""
-Proposed opcode | OpenCL FP32 | OpenCL FP16 | OpenCL native | OpenCL fast |
-FSIN | sin | half\_sin | native\_sin | NONE |
-FCOS | cos | half\_cos | native\_cos | NONE |
-FTAN | tan | half\_tan | native\_tan | NONE |
-NONE (1) | sincos | NONE | NONE | NONE |
-FASIN | asin | NONE | NONE | NONE |
-FACOS | acos | NONE | NONE | NONE |
-FATAN | atan | NONE | NONE | NONE |
-FSINPI | sinpi | NONE | NONE | NONE |
-FCOSPI | cospi | NONE | NONE | NONE |
-FTANPI | tanpi | NONE | NONE | NONE |
-FASINPI | asinpi | NONE | NONE | NONE |
-FACOSPI | acospi | NONE | NONE | NONE |
-FATANPI | atanpi | NONE | NONE | NONE |
-FSINH | sinh | NONE | NONE | NONE |
-FCOSH | cosh | NONE | NONE | NONE |
-FTANH | tanh | NONE | NONE | NONE |
-FASINH | asinh | NONE | NONE | NONE |
-FACOSH | acosh | NONE | NONE | NONE |
-FATANH | atanh | NONE | NONE | NONE |
-FRSQRT | rsqrt | half\_rsqrt | native\_rsqrt | NONE |
-FCBRT | cbrt | NONE | NONE | NONE |
-FEXP2 | exp2 | half\_exp2 | native\_exp2 | NONE |
-FLOG2 | log2 | half\_log2 | native\_log2 | NONE |
-FEXPM1 | expm1 | NONE | NONE | NONE |
-FLOG1P | log1p | NONE | NONE | NONE |
-FEXP | exp | half\_exp | native\_exp | NONE |
-FLOG | log | half\_log | native\_log | NONE |
-FEXP10 | exp10 | half\_exp10 | native\_exp10 | NONE |
-FLOG10 | log10 | half\_log10 | native\_log10 | NONE |
-FATAN2 | atan2 | NONE | NONE | NONE |
-FATAN2PI | atan2pi | NONE | NONE | NONE |
-FPOW | pow | NONE | NONE | NONE |
-FROOT | rootn | NONE | NONE | NONE |
-FHYPOT | hypot | NONE | NONE | NONE |
-FRECIP | NONE | half\_recip | native\_recip | NONE |
+opcode | OpenCL FP32 | OpenCL FP16 | OpenCL native | OpenCL fast | IEEE754 |
+FSIN | sin | half\_sin | native\_sin | NONE | sin |
+FCOS | cos | half\_cos | native\_cos | NONE | cos |
+FTAN | tan | half\_tan | native\_tan | NONE | tan |
+NONE (1) | sincos | NONE | NONE | NONE | NONE |
+FASIN | asin | NONE | NONE | NONE | asin |
+FACOS | acos | NONE | NONE | NONE | acos |
+FATAN | atan | NONE | NONE | NONE | atan |
+FSINPI | sinpi | NONE | NONE | NONE | sinPi |
+FCOSPI | cospi | NONE | NONE | NONE | cosPi |
+FTANPI | tanpi | NONE | NONE | NONE | tanPi |
+FASINPI | asinpi | NONE | NONE | NONE | asinPi |
+FACOSPI | acospi | NONE | NONE | NONE | acosPi |
+FATANPI | atanpi | NONE | NONE | NONE | atanPi |
+FSINH | sinh | NONE | NONE | NONE | sinh |
+FCOSH | cosh | NONE | NONE | NONE | cosh |
+FTANH | tanh | NONE | NONE | NONE | tanh |
+FASINH | asinh | NONE | NONE | NONE | asinh |
+FACOSH | acosh | NONE | NONE | NONE | acosh |
+FATANH | atanh | NONE | NONE | NONE | atanh |
+FATAN2 | atan2 | NONE | NONE | NONE | atan2 |
+FATAN2PI | atan2pi | NONE | NONE | NONE | atan2pi |
+FRSQRT | rsqrt | half\_rsqrt | native\_rsqrt | NONE | rSqrt |
+FCBRT | cbrt | NONE | NONE | NONE | NONE (2) |
+FEXP2 | exp2 | half\_exp2 | native\_exp2 | NONE | exp2 |
+FLOG2 | log2 | half\_log2 | native\_log2 | NONE | log2 |
+FEXPM1 | expm1 | NONE | NONE | NONE | expm1 |
+FLOG1P | log1p | NONE | NONE | NONE | logp1 |
+FEXP | exp | half\_exp | native\_exp | NONE | exp |
+FLOG | log | half\_log | native\_log | NONE | log |
+FEXP10 | exp10 | half\_exp10 | native\_exp10 | NONE | exp10 |
+FLOG10 | log10 | half\_log10 | native\_log10 | NONE | log10 |
+FPOW | pow | NONE | NONE | NONE | pow |
+FPOWN | pown | NONE | NONE | NONE | pown |
+FPOWR | powr | half\_powr | native\_powr | NONE | powr |
+FROOTN | rootn | NONE | NONE | NONE | rootn |
+FHYPOT | hypot | NONE | NONE | NONE | hypot |
+FRECIP | NONE | half\_recip | native\_recip | NONE | NONE (3) |
+NONE | NONE | NONE | NONE | NONE | compound |
+NONE | NONE | NONE | NONE | NONE | exp2m1 |
+NONE | NONE | NONE | NONE | NONE | exp10m1 |
+NONE | NONE | NONE | NONE | NONE | log2p1 |
+NONE | NONE | NONE | NONE | NONE | log10p1 |
"""]]
Note (1) FSINCOS is macro-op fused (see below).
+Note (2) synthesised in IEEE754-2019 as "pown(x, 3)"
+
+Note (3) synthesised in IEEE754-2019 using "1.0 / x"
+
## List of 2-arg opcodes
[[!table data="""
FATAN2 | atan2 arc tangent | rd = atan2(rs2, rs1) | Zarctrignpi |
FATAN2PI | atan2 arc tangent / pi | rd = atan2(rs2, rs1) / pi | Zarctrigpi |
FPOW | x power of y | rd = pow(rs1, rs2) | ZftransAdv |
-FROOT | x power 1/y | rd = pow(rs1, 1/rs2) | ZftransAdv |
+FPOWN | x power of n (n int) | rd = pow(rs1, rs2) | ZftransAdv |
+FPOWR | x power of y (x +ve) | rd = exp(rs1 log(rs2)) | ZftransAdv |
+FROOTN | x power 1/n (n integer)| rd = pow(rs1, 1/rs2) | ZftransAdv |
FHYPOT | hypotenuse | rd = sqrt(rs1^2 + rs2^2) | ZftransAdv |
"""]]
FATAN | arctan (radians) | rd = atan(rs1) | Zarctrignpi |
FSINPI | sin times pi | rd = sin(pi * rs1) | Ztrigpi |
FCOSPI | cos times pi | rd = cos(pi * rs1) | Ztrigpi |
-
FTANPI | tan times pi | rd = tan(pi * rs1) | Ztrigpi |
FASINPI | arcsin / pi | rd = asin(rs1) / pi | Zarctrigpi |
FACOSPI | arccos / pi | rd = acos(rs1) / pi | Zarctrigpi |
F3 - fsqrt (square root)
F4 - fexp2 (2^x)
F5 - flog2
- F6 - fsin
- F7 - fcos
+ F6 - fsin1pi
+ F7 - fcos1pi
F9 - fatan_pt1
These in FP32 and FP16 only: no FP32 hardware, at all.
AMD's R600 GPU (R600\_Instruction\_Set\_Architecture.pdf) and the
RDNA ISA (RDNA\_Shader\_ISA\_5August2019.pdf, Table 22, Section 6.3) have:
- COS (appx)
+ COS2PI (appx)
EXP2
LOG (IEEE754)
RECIP
RSQRT
SQRT
- SIN (appx)
+ SIN2PI (appx)
AMD RDNA has F16 and F32 variants of all the above, and also has F64
variants of SQRT, RSQRT and RECIP. It is interesting that even the
### ZftransAdv
-CBRT, POW, ROOT (inverse of POW): these are simply much more complex
-to implement in hardware, and typically will only be put into HPC
-applications.
+CBRT, POW, POWN, POWR, ROOTN
-ROOT is included as well as POW because at the extreme ranges one is
-more accurate than the other.
+These are simply much more complex to implement in hardware, and typically
+will only be put into HPC applications.
* **Zfrsqrt**: Reciprocal square-root.