ztrans_proposal.mdwn

   1 # Zftrans - transcendental operations
   2
   3 See:
   4
   5 * <http://bugs.libre-riscv.org/show_bug.cgi?id=127>
   6 * <https://www.khronos.org/registry/spir-v/specs/unified1/OpenCL.ExtendedInstructionSet.100.html>
   7 * Discussion: <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002342.html>
   8 * [[rv_major_opcode_1010011]] for opcode listing.
   9
  10 Extension subsets:
  11
  12 * **Zftrans**: standard transcendentals (best suited to 3D)
  13 * **ZftransExt**: extra functions (useful, not generally needed for 3D,
  14   can be synthesised using Ztrans)
  15 * **Ztrigpi**: trig. xxx-pi sinpi cospi tanpi
  16 * **Ztrignpi**: trig non-xxx-pi sin cos tan
  17 * **Zarctrigpi**: arc-trig. a-xxx-pi: atan2pi asinpi acospi
  18 * **Zarctrignpi**: arc-trig. non-a-xxx-pi: atan2, asin, acos
  19 * **Zfhyp**: hyperbolic/inverse-hyperbolic.  sinh, cosh, tanh, asinh,
  20   acosh, atanh (can be synthesised - see below)
  21 * **ZftransAdv**: much more complex to implement in hardware
  22 * **Zfrsqrt**: Reciprocal square-root.
  23
  24 Minimum recommended requirements for 3D: Zftrans, Ztrigpi, Zarctrigpi,
  25 Zarctrignpi
  26
  27 [[!toc levels=2]]
  28
  29 # TODO:
  30
  31 * Decision on accuracy
  32 <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002355.html>
  33 * Errors **MUST** be repeatable.
  34 * How about three Platform Specifications? 3D, UNIX and Embedded?
  35 <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002361.html>
  36   Accuracy requirements for dual (triple) purpose implementations must
  37   meet the higher standard.
  38 * Reciprocal Square-root is in its own separate extension (Zfrsqrt) as
  39   it is desirable on its own by other implementors.  This to be evaluated.
  40
  41
  42 # List of 2-arg opcodes
  43
  44 [[!table  data="""
  45 opcode    | Description           | pseudo-code                | Extension |
  46 FATAN2    | atan2 arc tangent     | rd = atan2(rs2, rs1)       | Zarctrignpi |
  47 FATAN2PI  | atan arc tangent / pi | rd = atan2(rs2, rs1) / pi  | Zarctrigpi |
  48 FPOW      | x power of y          | rd = pow(rs1, rs2)         | ZftransAdv |
  49 FROOT     | x power 1/y           | rd = pow(rs1, 1/rs2)       | ZftransAdv |
  50 FHYPOT    | hypotenuse            | rd = sqrt(rs1^2 + rs2^2)       | Zftrans    |
  51 """]]
  52
  53 # List of 1-arg transcendental opcodes
  54
  55 [[!table  data="""
  56 opcode   | Description              | pseudo-code             | Extension |
  57 FRSQRT   | Reciprocal Square-root   | rd = sqrt(rs1)          | Zfrsqrt    |
  58 FCBRT    | Cube Root                | rd = pow(rs1, 3)        | Zftrans    |
  59 FEXP2    | power-of-2               | rd = pow(2, rs1)        | Zftrans    |
  60 FLOG2    | log2                     | rd = log2(rs1)          | Zftrans    |
  61 FEXPM1   | exponent minus 1         | rd = pow(e, rs1) - 1.0  | Zftrans    |
  62 FLOG1P   | log plus 1               | rd = log(e, 1 + rs1)    | Zftrans    |
  63 FEXP     | exponent                 | rd = pow(e, rs1)        | ZftransExt |
  64 FLOG     | natural log (base e)     | rd = log(e, rs1)        | ZftransExt |
  65 FEXP10   | power-of-10              | rd = pow(10, rs1)       | ZftransExt |
  66 FLOG10   | log base 10              | rd = log10(rs1)         | ZftransExt |
  67 """]]
  68
  69 # List of 1-arg trigonometric opcodes
  70
  71 [[!table  data="""
  72 opcode   | Description              | pseudo-code             | Extension |
  73 FSIN     | sin (radians)            | rd = sin(rs1)           | Ztrignpi    |
  74 FCOS     | cos (radians)            | rd = cos(rs1)           | Ztrignpi    |
  75 FTAN     | tan (radians)            | rd = tan(rs1)           | Ztrignpi    |
  76 FASIN    | arcsin (radians)         | rd = asin(rs1)          | Zarctrignpi |
  77 FACOS    | arccos (radians)         | rd = acos(rs1)          | Zarctrignpi |
  78 FSINPI   | sin times pi             | rd = sin(pi * rs1)      | Ztrigpi |
  79 FCOSPI   | cos times pi             | rd = cos(pi * rs1)      | Ztrigpi |
  80 FTANPI   | tan times pi             | rd = tan(pi * rs1)      | Ztrigpi |
  81 FASINPI  | arcsin times pi          | rd = asin(pi * rs1)     | Zarctrigpi |
  82 FACOSPI  | arccos times pi          | rd = acos(pi * rs1)     | Zarctrigpi |
  83 FATANPI  | arctan times pi          | rd = atan(pi * rs1)     | Zarctrigpi |
  84 FSINH    | hyperbolic sin (radians) | rd = sinh(rs1)          | Zfhyp |
  85 FCOSH    | hyperbolic cos (radians) | rd = cosh(rs1)          | Zfhyp |
  86 FTANH    | hyperbolic tan (radians) | rd = tanh(rs1)          | Zfhyp |
  87 FASINH   | inverse hyperbolic sin   | rd = asinh(rs1)         | Zfhyp |
  88 FACOSH   | inverse hyperbolic cos   | rd = acosh(rs1)         | Zfhyp |
  89 FATANH   | inverse hyperbolic tan   | rd = atanh(rs1)         | Zfhyp |
  90 """]]
  91
  92 # Synthesis, Pseudo-code ops and macro-ops
  93
  94 The pseudo-ops are best left up to the compiler rather than being actual
  95 pseudo-ops, by allocating one scalar FP register for use as a constant
  96 (loop invariant) set to "1.0" at the beginning of a function or other
  97 suitable code block.
  98
  99 * FRCP rd, rs1 - pseudo-code alias for rd = 1.0 / rs1
 100 * FATAN - pseudo-code alias for rd = atan2(rs1, 1.0) - FATAN2
 101 * FATANPI - pseudo alias for rd = atan2pi(rs1, 1.0) - FATAN2PI
 102 * FSINCOS - fused macro-op between FSIN and FCOS (issued in that order).
 103 * FSINCOSPI - fused macro-op between FSINPI and FCOSPI (issued in that order).
 104
 105 FATANPI example pseudo-code:
 106
 107     lui t0, 0x3F800 // upper bits of f32 1.0
 108     fmv.x.s ft0, t0
 109     fatan2pi.s rd, rs1, ft0
 110
 111 Hypotenuse example (obviates need for Zfhyp except for high-performance):
 112
 113     ASINH( x ) = ln( x + SQRT(x**2+1)
 114
 115 LOG / LOGP1 example:
 116
 117     LOG(x) = LOGP1(x) + 1.0
 118     EXP(x) = EXPM1(x-1.0)
 119
 120 # To evaluate: should LOG be replaced with LOG1P (and EXP with EXPM1)?
 121
 122 RISC principle says "exclude LOG because it's covered by LOGP1 plus an ADD".
 123 Research needed to ensure that implementors are not compromised by such
 124 a decision
 125 <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002358.html>
 126
 127 # Dynamic accuracy CSR
 128
 129 maybe a solution would be to add an extra field to the fp control csr to allow selecting one of several accurate or fast modes:
 130
 131 - machine-learning-mode: fast as possible
 132     -- maybe need additional requirements such as monotonicity for atanh?
 133 - GPU-mode: accurate to within a few ULP
 134     -- see Vulkan, OpenGL, and OpenCL specs for accuracy guidelines
 135 - almost-accurate-mode: accurate to <1 ULP
 136      (would 0.51 or some other value be better?)
 137 - fully-accurate-mode: correctly rounded in all cases
 138 - maybe more modes?