[libreriscv.git] / ztrans_proposal.mdwn
1 # Zftrans - transcendental operations
3 See:
5 * <http://bugs.libre-riscv.org/show_bug.cgi?id=127>
6 * <https://www.khronos.org/registry/spir-v/specs/unified1/OpenCL.ExtendedInstructionSet.100.html>
7 * Discussion: <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002342.html>
8 * [[rv_major_opcode_1010011]] for opcode listing.
10 Extension subsets:
12 * **Zftrans**: standard transcendentals (best suited to 3D)
13 * **ZftransExt**: extra functions (useful, not generally needed for 3D,
14 can be synthesised using Ztrans)
15 * **Ztrigpi**: trig. xxx-pi sinpi cospi tanpi
16 * **Ztrignpi**: trig non-xxx-pi sin cos tan
17 * **Zarctrigpi**: arc-trig. a-xxx-pi: atan2pi asinpi acospi
18 * **Zarctrignpi**: arc-trig. non-a-xxx-pi: atan2, asin, acos
19 * **Zfhyp**: hyperbolic/inverse-hyperbolic. sinh, cosh, tanh, asinh,
20 acosh, atanh (can be synthesised - see below)
21 * **ZftransAdv**: much more complex to implement in hardware
22 * **Zfrsqrt**: Reciprocal square-root.
24 Minimum recommended requirements for 3D: Zftrans, Ztrigpi, Zarctrigpi,
25 Zarctrignpi
27 [[!toc levels=2]]
29 # TODO:
31 * Decision on accuracy
32 <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002355.html>
33 * Errors **MUST** be repeatable.
34 * How about three Platform Specifications? 3D, UNIX and Embedded?
35 <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002361.html>
36 Accuracy requirements for dual (triple) purpose implementations must
37 meet the higher standard.
38 * Reciprocal Square-root is in its own separate extension (Zfrsqrt) as
39 it is desirable on its own by other implementors. This to be evaluated.
42 # List of 2-arg opcodes
44 [[!table data="""
45 opcode | Description | pseudo-code | Extension |
46 FATAN2 | atan2 arc tangent | rd = atan2(rs2, rs1) | Zarctrignpi |
47 FATAN2PI | atan arc tangent / pi | rd = atan2(rs2, rs1) / pi | Zarctrigpi |
48 FPOW | x power of y | rd = pow(rs1, rs2) | ZftransAdv |
49 FROOT | x power 1/y | rd = pow(rs1, 1/rs2) | ZftransAdv |
50 FHYPOT | hypotenuse | rd = sqrt(rs1^2 + rs2^2) | Zftrans |
51 """]]
53 # List of 1-arg transcendental opcodes
55 [[!table data="""
56 opcode | Description | pseudo-code | Extension |
57 FRSQRT | Reciprocal Square-root | rd = sqrt(rs1) | Zfrsqrt |
58 FCBRT | Cube Root | rd = pow(rs1, 3) | Zftrans |
59 FEXP2 | power-of-2 | rd = pow(2, rs1) | Zftrans |
60 FLOG2 | log2 | rd = log2(rs1) | Zftrans |
61 FEXPM1 | exponent minus 1 | rd = pow(e, rs1) - 1.0 | Zftrans |
62 FLOG1P | log plus 1 | rd = log(e, 1 + rs1) | Zftrans |
63 FEXP | exponent | rd = pow(e, rs1) | ZftransExt |
64 FLOG | natural log (base e) | rd = log(e, rs1) | ZftransExt |
65 FEXP10 | power-of-10 | rd = pow(10, rs1) | ZftransExt |
66 FLOG10 | log base 10 | rd = log10(rs1) | ZftransExt |
67 """]]
69 # List of 1-arg trigonometric opcodes
71 [[!table data="""
72 opcode | Description | pseudo-code | Extension |
73 FSIN | sin (radians) | rd = sin(rs1) | Ztrignpi |
74 FCOS | cos (radians) | rd = cos(rs1) | Ztrignpi |
75 FTAN | tan (radians) | rd = tan(rs1) | Ztrignpi |
76 FASIN | arcsin (radians) | rd = asin(rs1) | Zarctrignpi |
77 FACOS | arccos (radians) | rd = acos(rs1) | Zarctrignpi |
78 FSINPI | sin times pi | rd = sin(pi * rs1) | Ztrigpi |
79 FCOSPI | cos times pi | rd = cos(pi * rs1) | Ztrigpi |
80 FTANPI | tan times pi | rd = tan(pi * rs1) | Ztrigpi |
81 FASINPI | arcsin times pi | rd = asin(pi * rs1) | Zarctrigpi |
82 FACOSPI | arccos times pi | rd = acos(pi * rs1) | Zarctrigpi |
83 FATANPI | arctan times pi | rd = atan(pi * rs1) | Zarctrigpi |
84 FSINH | hyperbolic sin (radians) | rd = sinh(rs1) | Zfhyp |
85 FCOSH | hyperbolic cos (radians) | rd = cosh(rs1) | Zfhyp |
86 FTANH | hyperbolic tan (radians) | rd = tanh(rs1) | Zfhyp |
87 FASINH | inverse hyperbolic sin | rd = asinh(rs1) | Zfhyp |
88 FACOSH | inverse hyperbolic cos | rd = acosh(rs1) | Zfhyp |
89 FATANH | inverse hyperbolic tan | rd = atanh(rs1) | Zfhyp |
90 """]]
92 # Synthesis, Pseudo-code ops and macro-ops
94 The pseudo-ops are best left up to the compiler rather than being actual
95 pseudo-ops, by allocating one scalar FP register for use as a constant
96 (loop invariant) set to "1.0" at the beginning of a function or other
97 suitable code block.
99 * FRCP rd, rs1 - pseudo-code alias for rd = 1.0 / rs1
100 * FATAN - pseudo-code alias for rd = atan2(rs1, 1.0) - FATAN2
101 * FATANPI - pseudo alias for rd = atan2pi(rs1, 1.0) - FATAN2PI
102 * FSINCOS - fused macro-op between FSIN and FCOS (issued in that order).
103 * FSINCOSPI - fused macro-op between FSINPI and FCOSPI (issued in that order).
105 FATANPI example pseudo-code:
107 lui t0, 0x3F800 // upper bits of f32 1.0
108 fmv.x.s ft0, t0
109 fatan2pi.s rd, rs1, ft0
111 Hypotenuse example (obviates need for Zfhyp except for high-performance):
113 ASINH( x ) = ln( x + SQRT(x**2+1)
115 LOG / LOGP1 example:
117 LOG(x) = LOGP1(x) + 1.0
118 EXP(x) = EXPM1(x-1.0)
120 # To evaluate: should LOG be replaced with LOG1P (and EXP with EXPM1)?
122 RISC principle says "exclude LOG because it's covered by LOGP1 plus an ADD".
123 Research needed to ensure that implementors are not compromised by such
124 a decision
125 <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002358.html>
127 # Dynamic accuracy CSR
129 maybe a solution would be to add an extra field to the fp control csr to allow selecting one of several accurate or fast modes:
131 - machine-learning-mode: fast as possible
132 -- maybe need additional requirements such as monotonicity for atanh?
133 - GPU-mode: accurate to within a few ULP
134 -- see Vulkan, OpenGL, and OpenCL specs for accuracy guidelines
135 - almost-accurate-mode: accurate to <1 ULP
136 (would 0.51 or some other value be better?)
137 - fully-accurate-mode: correctly rounded in all cases
138 - maybe more modes?