X-Git-Url: https://git.libre-soc.org/?a=blobdiff_plain;f=ztrans_proposal.mdwn;h=ca73c4220a0faf66a30917a4b688fe32d0a6a657;hb=9383bbd3177aa5fd486a7e0dcbb8179b11238a4c;hp=51c6be7af6e52e36d6a0ae25806ca9eb13fd09a0;hpb=6a8d014788173d7472233fd110bb7634000d5e00;p=libreriscv.git diff --git a/ztrans_proposal.mdwn b/ztrans_proposal.mdwn index 51c6be7af..ca73c4220 100644 --- a/ztrans_proposal.mdwn +++ b/ztrans_proposal.mdwn @@ -1,7 +1,14 @@ +**OBSOLETE**, superceded by [[openpower/transcendentals]] + # Zftrans - transcendental operations -With thanks to: +Summary: + +*This proposal extends RISC-V scalar floating point operations to add IEEE754 transcendental functions (pow, log etc) and trigonometric functions (sin, cos etc). These functions are also 98% shared with the Khronos Group OpenCL Extended Instruction Set.* + +Authors/Contributors: +* Luke Kenneth Casson Leighton * Jacob Lifshay * Dan Petroski * Mitch Alsup @@ -54,7 +61,7 @@ Minimum recommended requirements for Mobile-Embedded 3D: Ztrignpi, Zftrans, with This proposal is designed to meet a wide range of extremely diverse needs, allowing implementors from all of them to benefit from the tools and hardware -cost reductions associated with common standards adoption. +cost reductions associated with common standards adoption in RISC-V (primarily IEEE754 and Vulkan). **There are *four* different, disparate platform's needs (two new)**: @@ -210,11 +217,14 @@ Any deviation from Trademarked Standards means that an implementation may not be sold and also make a claim of being, for example, "Vulkan compatible". -This in turn reinforces and makes a hard requirement a need for public +For 3D, this in turn reinforces and makes a hard requirement a need for public compliance with such standards, over-and-above what would otherwise be set by a RISC-V Standards Development Process, including both the software compliance and the knock-on implications that has for hardware. +For libraries such as libm and numpy, accuracy is paramount, for software interoperability across multiple platforms. Some algorithms critically rely on correct IEEE754, for example. +The conflicting accuracy requirements can be met through the zfpacc extension. + **Collaboration**: The case for collaboration on any Extension is already well-known. @@ -310,15 +320,19 @@ and is **not** an appropriate methodology for use in other Extensions with huge (non-uniform) market diversity even with similarly large numbers of potential opcodes. BitManip is the perfect counter-example. -# Proposed Opcodes vs Khronos OpenCL Opcodes +# Proposed Opcodes vs Khronos OpenCL vs IEEE754-2019 + +This list shows the (direct) equivalence between proposed opcodes, +their Khronos OpenCL equivalents, and their IEEE754-2019 equivalents. +98% of the opcodes in this proposal that are in the IEEE754-2019 standard +are present in the Khronos Extended Instruction Set. -This list shows the (direct) equivalence between proposed opcodes and -their Khronos OpenCL equivalents. For RISCV opcode encodings see [[rv_major_opcode_1010011]] See +and * Special FP16 opcodes are *not* being proposed, except by indirect / inherent use of the "fmt" field that is already present in the RISC-V Specification. @@ -334,47 +348,62 @@ Khronos Specification accuracy requirements - is not an option, as it results in non-compliance, and the vendor may not use the Trademarked words "Vulkan" etc. in conjunction with their product. +IEEE754-2019 Table 9.1 lists "additional mathematical operations". +Interestingly the only functions missing when compared to OpenCL are +compound, exp2m1, exp10m1, log2p1, log10p1, pown (integer power) and powr. + [[!table data=""" -Proposed opcode | OpenCL FP32 | OpenCL FP16 | OpenCL native | OpenCL fast | -FSIN | sin | half\_sin | native\_sin | NONE | -FCOS | cos | half\_cos | native\_cos | NONE | -FTAN | tan | half\_tan | native\_tan | NONE | -NONE (1) | sincos | NONE | NONE | NONE | -FASIN | asin | NONE | NONE | NONE | -FACOS | acos | NONE | NONE | NONE | -FATAN | atan | NONE | NONE | NONE | -FSINPI | sinpi | NONE | NONE | NONE | -FCOSPI | cospi | NONE | NONE | NONE | -FTANPI | tanpi | NONE | NONE | NONE | -FASINPI | asinpi | NONE | NONE | NONE | -FACOSPI | acospi | NONE | NONE | NONE | -FATANPI | atanpi | NONE | NONE | NONE | -FSINH | sinh | NONE | NONE | NONE | -FCOSH | cosh | NONE | NONE | NONE | -FTANH | tanh | NONE | NONE | NONE | -FASINH | asinh | NONE | NONE | NONE | -FACOSH | acosh | NONE | NONE | NONE | -FATANH | atanh | NONE | NONE | NONE | -FRSQRT | rsqrt | half\_rsqrt | native\_rsqrt | NONE | -FCBRT | cbrt | NONE | NONE | NONE | -FEXP2 | exp2 | half\_exp2 | native\_exp2 | NONE | -FLOG2 | log2 | half\_log2 | native\_log2 | NONE | -FEXPM1 | expm1 | NONE | NONE | NONE | -FLOG1P | log1p | NONE | NONE | NONE | -FEXP | exp | half\_exp | native\_exp | NONE | -FLOG | log | half\_log | native\_log | NONE | -FEXP10 | exp10 | half\_exp10 | native\_exp10 | NONE | -FLOG10 | log10 | half\_log10 | native\_log10 | NONE | -FATAN2 | atan2 | NONE | NONE | NONE | -FATAN2PI | atan2pi | NONE | NONE | NONE | -FPOW | pow | NONE | NONE | NONE | -FROOT | rootn | NONE | NONE | NONE | -FHYPOT | hypot | NONE | NONE | NONE | -FRECIP | NONE | half\_recip | native\_recip | NONE | +opcode | OpenCL FP32 | OpenCL FP16 | OpenCL native | OpenCL fast | IEEE754 | +FSIN | sin | half\_sin | native\_sin | NONE | sin | +FCOS | cos | half\_cos | native\_cos | NONE | cos | +FTAN | tan | half\_tan | native\_tan | NONE | tan | +NONE (1) | sincos | NONE | NONE | NONE | NONE | +FASIN | asin | NONE | NONE | NONE | asin | +FACOS | acos | NONE | NONE | NONE | acos | +FATAN | atan | NONE | NONE | NONE | atan | +FSINPI | sinpi | NONE | NONE | NONE | sinPi | +FCOSPI | cospi | NONE | NONE | NONE | cosPi | +FTANPI | tanpi | NONE | NONE | NONE | tanPi | +FASINPI | asinpi | NONE | NONE | NONE | asinPi | +FACOSPI | acospi | NONE | NONE | NONE | acosPi | +FATANPI | atanpi | NONE | NONE | NONE | atanPi | +FSINH | sinh | NONE | NONE | NONE | sinh | +FCOSH | cosh | NONE | NONE | NONE | cosh | +FTANH | tanh | NONE | NONE | NONE | tanh | +FASINH | asinh | NONE | NONE | NONE | asinh | +FACOSH | acosh | NONE | NONE | NONE | acosh | +FATANH | atanh | NONE | NONE | NONE | atanh | +FATAN2 | atan2 | NONE | NONE | NONE | atan2 | +FATAN2PI | atan2pi | NONE | NONE | NONE | atan2pi | +FRSQRT | rsqrt | half\_rsqrt | native\_rsqrt | NONE | rSqrt | +FCBRT | cbrt | NONE | NONE | NONE | NONE (2) | +FEXP2 | exp2 | half\_exp2 | native\_exp2 | NONE | exp2 | +FLOG2 | log2 | half\_log2 | native\_log2 | NONE | log2 | +FEXPM1 | expm1 | NONE | NONE | NONE | expm1 | +FLOG1P | log1p | NONE | NONE | NONE | logp1 | +FEXP | exp | half\_exp | native\_exp | NONE | exp | +FLOG | log | half\_log | native\_log | NONE | log | +FEXP10 | exp10 | half\_exp10 | native\_exp10 | NONE | exp10 | +FLOG10 | log10 | half\_log10 | native\_log10 | NONE | log10 | +FPOW | pow | NONE | NONE | NONE | pow | +FPOWN | pown | NONE | NONE | NONE | pown | +FPOWR | powr | half\_powr | native\_powr | NONE | powr | +FROOTN | rootn | NONE | NONE | NONE | rootn | +FHYPOT | hypot | NONE | NONE | NONE | hypot | +FRECIP | NONE | half\_recip | native\_recip | NONE | NONE (3) | +NONE | NONE | NONE | NONE | NONE | compound | +NONE | NONE | NONE | NONE | NONE | exp2m1 | +NONE | NONE | NONE | NONE | NONE | exp10m1 | +NONE | NONE | NONE | NONE | NONE | log2p1 | +NONE | NONE | NONE | NONE | NONE | log10p1 | """]] Note (1) FSINCOS is macro-op fused (see below). +Note (2) synthesised in IEEE754-2019 as "pown(x, 3)" + +Note (3) synthesised in IEEE754-2019 using "1.0 / x" + ## List of 2-arg opcodes [[!table data=""" @@ -382,7 +411,9 @@ opcode | Description | pseudocode | Extension | FATAN2 | atan2 arc tangent | rd = atan2(rs2, rs1) | Zarctrignpi | FATAN2PI | atan2 arc tangent / pi | rd = atan2(rs2, rs1) / pi | Zarctrigpi | FPOW | x power of y | rd = pow(rs1, rs2) | ZftransAdv | -FROOT | x power 1/y | rd = pow(rs1, 1/rs2) | ZftransAdv | +FPOWN | x power of n (n int) | rd = pow(rs1, rs2) | ZftransAdv | +FPOWR | x power of y (x +ve) | rd = exp(rs1 log(rs2)) | ZftransAdv | +FROOTN | x power 1/n (n integer)| rd = pow(rs1, 1/rs2) | ZftransAdv | FHYPOT | hypotenuse | rd = sqrt(rs1^2 + rs2^2) | ZftransAdv | """]] @@ -415,7 +446,6 @@ FACOS | arccos (radians) | rd = acos(rs1) | Zarctrignpi | FATAN | arctan (radians) | rd = atan(rs1) | Zarctrignpi | FSINPI | sin times pi | rd = sin(pi * rs1) | Ztrigpi | FCOSPI | cos times pi | rd = cos(pi * rs1) | Ztrigpi | - FTANPI | tan times pi | rd = tan(pi * rs1) | Ztrigpi | FASINPI | arcsin / pi | rd = asin(rs1) / pi | Zarctrigpi | FACOSPI | arccos / pi | rd = acos(rs1) / pi | Zarctrigpi | @@ -446,8 +476,8 @@ following opcodes: F3 - fsqrt (square root) F4 - fexp2 (2^x) F5 - flog2 - F6 - fsin - F7 - fcos + F6 - fsin1pi + F7 - fcos1pi F9 - fatan_pt1 These in FP32 and FP16 only: no FP32 hardware, at all. @@ -465,13 +495,13 @@ It also has fast variants of some of these, as a CSR Mode. AMD's R600 GPU (R600\_Instruction\_Set\_Architecture.pdf) and the RDNA ISA (RDNA\_Shader\_ISA\_5August2019.pdf, Table 22, Section 6.3) have: - COS (appx) + COS2PI (appx) EXP2 LOG (IEEE754) RECIP RSQRT SQRT - SIN (appx) + SIN2PI (appx) AMD RDNA has F16 and F32 variants of all the above, and also has F64 variants of SQRT, RSQRT and RECIP. It is interesting that even the @@ -536,12 +566,10 @@ HPC and high-end GPUs are likely markets for these. ### ZftransAdv -CBRT, POW, ROOT (inverse of POW): these are simply much more complex -to implement in hardware, and typically will only be put into HPC -applications. +CBRT, POW, POWN, POWR, ROOTN -ROOT is included as well as POW because at the extreme ranges one is -more accurate than the other. +These are simply much more complex to implement in hardware, and typically +will only be put into HPC applications. * **Zfrsqrt**: Reciprocal square-root.