add nlnet proposals links

[libreriscv.git] / ztrans_proposal.mdwn
diff --git a/ztrans_proposal.mdwn b/ztrans_proposal.mdwn

index 4e2eb127ecf07f8033e6eb3acf81854e12cb5a81..5e2685850779d2e19251704dda8e7a88c1716a98 100644 (file)
--- a/ztrans_proposal.mdwn
+++ b/ztrans_proposal.mdwn
@@ -1,5 +1,9 @@
  # Zftrans - transcendental operations
  
+Summary:
+
+*This proposal extends RISC-V scalar floating point operations to add IEEE754 transcendental functions (pow, log etc) and trigonometric functions (sin, cos etc). These functions are also 98% shared with the Khronos Group OpenCL Extended Instruction Set.*
+
  With thanks to:
  
  * Jacob Lifshay
@@ -33,10 +37,10 @@ Extension subsets:
  * **ZftransAdv**: much more complex to implement in hardware
  * **Zfrsqrt**: Reciprocal square-root.
  
-Minimum recommended requirements for 3D: Zftrans, Ztrigpi, Ztrignpi, Zarctrigpi,
-Zarctrignpi
+Minimum recommended requirements for 3D: Zftrans, Ztrignpi,
+Zarctrignpi, with Ztrigpi and Zarctrigpi as augmentations.
  
-Minimum recommended requirements for Mobile-Embedded 3D: Ztrigpi, Zftrans, Ztrignpi
+Minimum recommended requirements for Mobile-Embedded 3D: Ztrignpi, Zftrans, with Ztrigpi as an augmentation.
  
  # TODO:
  
@@ -54,7 +58,7 @@ Minimum recommended requirements for Mobile-Embedded 3D: Ztrigpi, Zftrans, Ztrig
  
  This proposal is designed to meet a wide range of extremely diverse needs,
  allowing implementors from all of them to benefit from the tools and hardware
-cost reductions associated with common standards adoption.
+cost reductions associated with common standards adoption in RISC-V (primarily IEEE754 and Vulkan).
  
  **There are *four* different, disparate platform's needs (two new)**:
  
@@ -210,11 +214,14 @@ Any deviation from Trademarked Standards means that an implementation
  may not be sold and also make a claim of being, for example, "Vulkan
  compatible".
  
-This in turn reinforces and makes a hard requirement a need for public
+For 3D, this in turn reinforces and makes a hard requirement a need for public
  compliance with such standards, over-and-above what would otherwise be
  set by a RISC-V Standards Development Process, including both the
  software compliance and the knock-on implications that has for hardware.
  
+For libraries such as libm and numpy, accuracy is paramount, for software  interoperability across multiple platforms. Some algorithms critically rely on correct IEEE754, for example.
+The conflicting accuracy requirements can be met through the zfpacc extension.
+
  **Collaboration**:
  
  The case for collaboration on any Extension is already well-known.
@@ -310,15 +317,19 @@ and is **not** an appropriate methodology for use in other Extensions
  with huge (non-uniform) market diversity even with similarly large
  numbers of potential opcodes.  BitManip is the perfect counter-example.
  
-# Proposed Opcodes vs Khronos OpenCL Opcodes <a name="khronos_equiv"></a>
+# Proposed Opcodes vs Khronos OpenCL vs IEEE754-2019<a name="khronos_equiv"></a>
+
+This list shows the (direct) equivalence between proposed opcodes,
+their Khronos OpenCL equivalents, and their IEEE754-2019 equivalents.
+98% of the opcodes in this proposal that are in the IEEE754-2019 standard
+are present in the Khronos Extended Instruction Set.
  
-This list shows the (direct) equivalence between proposed opcodes and
-their Khronos OpenCL equivalents.
  For RISCV opcode encodings see 
  [[rv_major_opcode_1010011]]
  
  See
  <https://www.khronos.org/registry/spir-v/specs/unified1/OpenCL.ExtendedInstructionSet.100.html>
+and <https://ieeexplore.ieee.org/document/8766229>
  
  * Special FP16 opcodes are *not* being proposed, except by indirect / inherent
    use of the "fmt" field that is already present in the RISC-V Specification.
@@ -334,47 +345,62 @@ Khronos Specification accuracy requirements - is not an option, as it
  results in non-compliance, and the vendor may not use the Trademarked words
  "Vulkan" etc. in conjunction with their product.
  
+IEEE754-2019 Table 9.1 lists "additional mathematical operations".
+Interestingly the only functions missing when compared to OpenCL are
+compound, exp2m1, exp10m1, log2p1, log10p1, pown (integer power) and powr.
+
  [[!table data="""
-Proposed opcode | OpenCL FP32 | OpenCL FP16 | OpenCL native | OpenCL fast |
-FSIN            | sin         | half\_sin   | native\_sin   | NONE        |
-FCOS            | cos         | half\_cos   | native\_cos   | NONE        |
-FTAN            | tan         | half\_tan   | native\_tan   | NONE        |
-NONE (1)        | sincos      | NONE        | NONE          | NONE        |
-FASIN           | asin        | NONE        | NONE          | NONE        |
-FACOS           | acos        | NONE        | NONE          | NONE        |
-FATAN           | atan        | NONE        | NONE          | NONE        |
-FSINPI          | sinpi       | NONE        | NONE          | NONE        |
-FCOSPI          | cospi       | NONE        | NONE          | NONE        |
-FTANPI          | tanpi       | NONE        | NONE          | NONE        |
-FASINPI         | asinpi      | NONE        | NONE          | NONE        |
-FACOSPI         | acospi      | NONE        | NONE          | NONE        |
-FATANPI         | atanpi      | NONE        | NONE          | NONE        |
-FSINH           | sinh        | NONE        | NONE          | NONE        |
-FCOSH           | cosh        | NONE        | NONE          | NONE        |
-FTANH           | tanh        | NONE        | NONE          | NONE        |
-FASINH          | asinh       | NONE        | NONE          | NONE        |
-FACOSH          | acosh       | NONE        | NONE          | NONE        |
-FATANH          | atanh       | NONE        | NONE          | NONE        |
-FRSQRT          | rsqrt       | half\_rsqrt | native\_rsqrt | NONE        |
-FCBRT           | cbrt        | NONE        | NONE          | NONE        |
-FEXP2           | exp2        | half\_exp2  | native\_exp2  | NONE        |
-FLOG2           | log2        | half\_log2  | native\_log2  | NONE        |
-FEXPM1          | expm1       | NONE        | NONE          | NONE        |
-FLOG1P          | log1p       | NONE        | NONE          | NONE        |
-FEXP            | exp         | half\_exp   | native\_exp   | NONE        |
-FLOG            | log         | half\_log   | native\_log   | NONE        |
-FEXP10          | exp10       | half\_exp10 | native\_exp10 | NONE        |
-FLOG10          | log10       | half\_log10 | native\_log10 | NONE        |
-FATAN2          | atan2       | NONE        | NONE          | NONE        |
-FATAN2PI        | atan2pi     | NONE        | NONE          | NONE        |
-FPOW            | pow         | NONE        | NONE          | NONE        |
-FROOT           | rootn       | NONE        | NONE          | NONE        |
-FHYPOT          | hypot       | NONE        | NONE          | NONE        |
-FRECIP          | NONE        | half\_recip | native\_recip | NONE        |
+opcode   | OpenCL FP32 | OpenCL FP16 | OpenCL native | OpenCL fast | IEEE754  |
+FSIN     | sin         | half\_sin   | native\_sin   | NONE        | sin      |
+FCOS     | cos         | half\_cos   | native\_cos   | NONE        | cos      |
+FTAN     | tan         | half\_tan   | native\_tan   | NONE        | tan      |
+NONE (1) | sincos      | NONE        | NONE          | NONE        | NONE     |
+FASIN    | asin        | NONE        | NONE          | NONE        | asin     |
+FACOS    | acos        | NONE        | NONE          | NONE        | acos     |
+FATAN    | atan        | NONE        | NONE          | NONE        | atan     |
+FSINPI   | sinpi       | NONE        | NONE          | NONE        | sinPi    |
+FCOSPI   | cospi       | NONE        | NONE          | NONE        | cosPi    |
+FTANPI   | tanpi       | NONE        | NONE          | NONE        | tanPi    |
+FASINPI  | asinpi      | NONE        | NONE          | NONE        | asinPi   |
+FACOSPI  | acospi      | NONE        | NONE          | NONE        | acosPi   |
+FATANPI  | atanpi      | NONE        | NONE          | NONE        | atanPi   |
+FSINH    | sinh        | NONE        | NONE          | NONE        | sinh     |
+FCOSH    | cosh        | NONE        | NONE          | NONE        | cosh     |
+FTANH    | tanh        | NONE        | NONE          | NONE        | tanh     |
+FASINH   | asinh       | NONE        | NONE          | NONE        | asinh    |
+FACOSH   | acosh       | NONE        | NONE          | NONE        | acosh    |
+FATANH   | atanh       | NONE        | NONE          | NONE        | atanh    |
+FATAN2   | atan2       | NONE        | NONE          | NONE        | atan2    |
+FATAN2PI | atan2pi     | NONE        | NONE          | NONE        | atan2pi  |
+FRSQRT   | rsqrt       | half\_rsqrt | native\_rsqrt | NONE        | rSqrt    |
+FCBRT    | cbrt        | NONE        | NONE          | NONE        | NONE (2) |
+FEXP2    | exp2        | half\_exp2  | native\_exp2  | NONE        | exp2     |
+FLOG2    | log2        | half\_log2  | native\_log2  | NONE        | log2     |
+FEXPM1   | expm1       | NONE        | NONE          | NONE        | expm1    |
+FLOG1P   | log1p       | NONE        | NONE          | NONE        | logp1    |
+FEXP     | exp         | half\_exp   | native\_exp   | NONE        | exp      |
+FLOG     | log         | half\_log   | native\_log   | NONE        | log      |
+FEXP10   | exp10       | half\_exp10 | native\_exp10 | NONE        | exp10    |
+FLOG10   | log10       | half\_log10 | native\_log10 | NONE        | log10    |
+FPOW     | pow         | NONE        | NONE          | NONE        | pow      |
+FPOWN    | pown        | NONE        | NONE          | NONE        | pown     |
+FPOWR    | powr        | half\_powr  | native\_powr  | NONE        | powr     |
+FROOTN   | rootn       | NONE        | NONE          | NONE        | rootn    |
+FHYPOT   | hypot       | NONE        | NONE          | NONE        | hypot    |
+FRECIP   | NONE        | half\_recip | native\_recip | NONE        | NONE (3) |
+NONE     | NONE        | NONE        | NONE          | NONE        | compound |
+NONE     | NONE        | NONE        | NONE          | NONE        | exp2m1   |
+NONE     | NONE        | NONE        | NONE          | NONE        | exp10m1  |
+NONE     | NONE        | NONE        | NONE          | NONE        | log2p1   |
+NONE     | NONE        | NONE        | NONE          | NONE        | log10p1  |
  """]]
  
  Note (1) FSINCOS is macro-op fused (see below).
  
+Note (2) synthesised in IEEE754-2019 as "pown(x, 3)"
+
+Note (3) synthesised in IEEE754-2019 using "1.0 / x"
+
  ## List of 2-arg opcodes
  
  [[!table  data="""
@@ -382,7 +408,9 @@ opcode    | Description            | pseudocode                 | Extension   |
  FATAN2    | atan2 arc tangent      | rd = atan2(rs2, rs1)       | Zarctrignpi |
  FATAN2PI  | atan2 arc tangent / pi | rd = atan2(rs2, rs1) / pi  | Zarctrigpi  |
  FPOW      | x power of y           | rd = pow(rs1, rs2)         | ZftransAdv  |
-FROOT     | x power 1/y            | rd = pow(rs1, 1/rs2)       | ZftransAdv  |
+FPOWN     | x power of n (n int)   | rd = pow(rs1, rs2)         | ZftransAdv  |
+FPOWR     | x power of y (x +ve)   | rd = exp(rs1 log(rs2))     | ZftransAdv  |
+FROOTN    | x power 1/n (n integer)| rd = pow(rs1, 1/rs2)       | ZftransAdv  |
  FHYPOT    | hypotenuse             | rd = sqrt(rs1^2 + rs2^2)   | ZftransAdv  |
  """]]
  
@@ -415,7 +443,6 @@ FACOS       | arccos (radians)         | rd = acos(rs1)          | Zarctrignpi |
  FATAN       | arctan (radians)         | rd = atan(rs1)          | Zarctrignpi |
  FSINPI      | sin times pi             | rd = sin(pi * rs1)      | Ztrigpi |
  FCOSPI      | cos times pi             | rd = cos(pi * rs1)      | Ztrigpi |
-
  FTANPI      | tan times pi             | rd = tan(pi * rs1)      | Ztrigpi |
  FASINPI     | arcsin / pi              | rd = asin(rs1) / pi     | Zarctrigpi |
  FACOSPI     | arccos / pi              | rd = acos(rs1) / pi     | Zarctrigpi |
@@ -446,8 +473,8 @@ following opcodes:
      F3 - fsqrt (square root)
      F4 - fexp2 (2^x)
      F5 - flog2
-    F6 - fsin
-    F7 - fcos
+    F6 - fsin1pi
+    F7 - fcos1pi
      F9 - fatan_pt1
  
  These in FP32 and FP16 only: no FP32 hardware, at all.
@@ -465,13 +492,13 @@ It also has fast variants of some of these, as a CSR Mode.
  AMD's R600 GPU (R600\_Instruction\_Set\_Architecture.pdf) and the
  RDNA ISA (RDNA\_Shader\_ISA\_5August2019.pdf, Table 22, Section 6.3) have:
  
-    COS (appx)
+    COS2PI (appx)
      EXP2
      LOG (IEEE754)
      RECIP
      RSQRT
      SQRT
-    SIN (appx)
+    SIN2PI (appx)
  
  AMD RDNA has F16 and F32 variants of all the above, and also has F64
  variants of SQRT, RSQRT and RECIP.  It is interesting that even the
@@ -536,12 +563,10 @@ HPC and high-end GPUs are likely markets for these.
  
  ### ZftransAdv
  
-CBRT, POW, ROOT (inverse of POW): these are simply much more complex
-to implement in hardware, and typically will only be put into HPC
-applications.
+CBRT, POW, POWN, POWR, ROOTN
  
-ROOT is included as well as POW because at the extreme ranges one is
-more accurate than the other.
+These are simply much more complex to implement in hardware, and typically
+will only be put into HPC applications.
  
  * **Zfrsqrt**: Reciprocal square-root.