clean up typos & add notes about correctly-rounded functions
[libreriscv.git] / ztrans_proposal.mdwn
1 # Zftrans - transcendental operations
2
3 See:
4
5 * <http://bugs.libre-riscv.org/show_bug.cgi?id=127>
6 * <https://www.khronos.org/registry/spir-v/specs/unified1/OpenCL.ExtendedInstructionSet.100.html>
7 * Discussion: <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002342.html>
8 * [[rv_major_opcode_1010011]] for opcode listing.
9 * [[zfpacc_proposal]] for accuracy settings proposal
10
11 Extension subsets:
12
13 * **Zftrans**: standard transcendentals (best suited to 3D)
14 * **ZftransExt**: extra functions (useful, not generally needed for 3D,
15 can be synthesised using Ztrans)
16 * **Ztrigpi**: trig. xxx-pi sinpi cospi tanpi
17 * **Ztrignpi**: trig non-xxx-pi sin cos tan
18 * **Zarctrigpi**: arc-trig. a-xxx-pi: atan2pi asinpi acospi
19 * **Zarctrignpi**: arc-trig. non-a-xxx-pi: atan2, asin, acos
20 * **Zfhyp**: hyperbolic/inverse-hyperbolic. sinh, cosh, tanh, asinh,
21 acosh, atanh (can be synthesised - see below)
22 * **ZftransAdv**: much more complex to implement in hardware
23 * **Zfrsqrt**: Reciprocal square-root.
24
25 Minimum recommended requirements for 3D: Zftrans, Ztrigpi, Zarctrigpi,
26 Zarctrignpi
27
28 [[!toc levels=2]]
29
30 # TODO:
31
32 * Decision on accuracy, moved to [[zfpacc_proposal]]
33 <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002355.html>
34 * Errors **MUST** be repeatable.
35 * How about four Platform Specifications? 3DUNIX, UNIX, 3DEmbedded and Embedded?
36 <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002361.html>
37 Accuracy requirements for dual (triple) purpose implementations must
38 meet the higher standard.
39 * Reciprocal Square-root is in its own separate extension (Zfrsqrt) as
40 it is desirable on its own by other implementors. This to be evaluated.
41
42 # Proposed Opcodes vs Khronos OpenCL Opcodes <a name="khronos_equiv"></a>
43
44 This list shows the (direct) equivalence between proposed opcodes and
45 their Khronos OpenCL equivalents.
46
47 See
48 <https://www.khronos.org/registry/spir-v/specs/unified1/OpenCL.ExtendedInstructionSet.100.html>
49
50 Special FP16 opcodes are *not* being proposed, except by indirect / inherent
51 use of the "fmt" field that is already present in the RISC-V Specification.
52
53 "Native" opcodes are *not* being proposed: implementors will be expected
54 to use the (equivalent) proposed opcode covering the same function.
55
56 "Fast" opcodes are *not* being proposed, because the Khronos Specification
57 fast\_length, fast\_normalise and fast\_distance OpenCL opcodes require
58 vectors (or can be done as scalar operations using other RISC-V instructions).
59
60 The OpenCL FP32 opcodes are **direct** equivalents to the proposed opcodes.
61 Deviation from conformance with the Khronos Specification - including the
62 Khronos Specification accuracy requirements - is not an option.
63
64 [[!table data="""
65 Proposed opcode | OpenCL FP32 | OpenCL FP16 | OpenCL native | OpenCL fast |
66 FSIN | sin | half\_sin | native\_sin | NONE |
67 FCOS | cos | half\_cos | native\_cos | NONE |
68 FTAN | tan | half\_tan | native\_tan | NONE |
69 FASIN | asin | NONE | NONE | NONE |
70 FACOS | acos | NONE | NONE | NONE |
71 NONE (3) | atan | NONE | NONE | NONE |
72 FSINPI | sinpi | NONE | NONE | NONE |
73 FCOSPI | cospi | NONE | NONE | NONE |
74 FTANPI | tanpi | NONE | NONE | NONE |
75 FASINPI | asinpi | NONE | NONE | NONE |
76 FACOSPI | acospi | NONE | NONE | NONE |
77 NONE (2) | atanpi | NONE | NONE | NONE |
78 FSINH | sinh | NONE | NONE | NONE |
79 FCOSH | cosh | NONE | NONE | NONE |
80 FTANH | tanh | NONE | NONE | NONE |
81 FASINH | asinh | NONE | NONE | NONE |
82 FACOSH | acosh | NONE | NONE | NONE |
83 FATANH | atanh | NONE | NONE | NONE |
84 FRSQRT | rsqrt | half\_rsqrt | native\_rsqrt | NONE |
85 FCBRT | cbrt | NONE | NONE | NONE |
86 FEXP2 | exp2 | half\_exp2 | native\_exp2 | NONE |
87 FLOG2 | log2 | half\_log2 | native\_log2 | NONE |
88 FEXPM1 (1) | expm1 | NONE | NONE | NONE |
89 FLOG1P (1) | log1p | NONE | NONE | NONE |
90 FEXP (1) | exp | half\_exp | native\_exp | NONE |
91 FLOG (1) | log | half\_log | native\_log | NONE |
92 FEXP10 | exp10 | half\_exp10 | native\_exp10 | NONE |
93 FLOG10 | log10 | half\_log10 | native\_log10 | NONE |
94 FATAN2 | atan2 | NONE | NONE | NONE |
95 FATAN2PI | atan2pi | NONE | NONE | NONE |
96 FPOW | pow | NONE | NONE | NONE |
97 FROOT | rootn | NONE | NONE | NONE |
98 FHYPOT | hypot | NONE | NONE | NONE |
99 """]]
100
101 Note (1): See "synthesis", below. FEXPM1, FEXP and FLOG1P, FLOG, may
102 be synthesised in terms of the other. FEXPM1 and FLOG1P are more accurate.
103 It is likely therefore that FLOG and FEXP will be removed.
104
105 Note (2) FATANPI is a synthesised alias, below.
106
107 Note (3) FATAN2 is a sythesised alias, below.
108
109 # List of 2-arg opcodes
110
111 [[!table data="""
112 opcode | Description | pseudo-code | Extension |
113 FATAN2 | atan2 arc tangent | rd = atan2(rs2, rs1) | Zarctrignpi |
114 FATAN2PI | atan2 arc tangent / pi | rd = atan2(rs2, rs1) / pi | Zarctrigpi |
115 FPOW | x power of y | rd = pow(rs1, rs2) | ZftransAdv |
116 FROOT | x power 1/y | rd = pow(rs1, 1/rs2) | ZftransAdv |
117 FHYPOT | hypotenuse | rd = sqrt(rs1^2 + rs2^2) | Zftrans |
118 """]]
119
120 # List of 1-arg transcendental opcodes
121
122 [[!table data="""
123 opcode | Description | pseudo-code | Extension |
124 FRSQRT | Reciprocal Square-root | rd = sqrt(rs1) | Zfrsqrt |
125 FCBRT | Cube Root | rd = pow(rs1, 3) | Zftrans |
126 FEXP2 | power-of-2 | rd = pow(2, rs1) | Zftrans |
127 FLOG2 | log2 | rd = log2(rs1) | Zftrans |
128 FEXPM1 | exponential minus 1 | rd = pow(e, rs1) - 1.0 | Zftrans |
129 FLOG1P | log plus 1 | rd = log(e, 1 + rs1) | Zftrans |
130 FEXP | exponential | rd = pow(e, rs1) | ZftransExt |
131 FLOG | natural log (base e) | rd = log(e, rs1) | ZftransExt |
132 FEXP10 | power-of-10 | rd = pow(10, rs1) | ZftransExt |
133 FLOG10 | log base 10 | rd = log10(rs1) | ZftransExt |
134 """]]
135
136 # List of 1-arg trigonometric opcodes
137
138 [[!table data="""
139 opcode | Description | pseudo-code | Extension |
140 FSIN | sin (radians) | rd = sin(rs1) | Ztrignpi |
141 FCOS | cos (radians) | rd = cos(rs1) | Ztrignpi |
142 FTAN | tan (radians) | rd = tan(rs1) | Ztrignpi |
143 FASIN | arcsin (radians) | rd = asin(rs1) | Zarctrignpi |
144 FACOS | arccos (radians) | rd = acos(rs1) | Zarctrignpi |
145 FSINPI | sin times pi | rd = sin(pi * rs1) | Ztrigpi |
146 FCOSPI | cos times pi | rd = cos(pi * rs1) | Ztrigpi |
147 FTANPI | tan times pi | rd = tan(pi * rs1) | Ztrigpi |
148 FASINPI | arcsin / pi | rd = asin(rs1) / pi | Zarctrigpi |
149 FACOSPI | arccos / pi | rd = acos(rs1) / pi | Zarctrigpi |
150 FATANPI | arctan / pi | rd = atan(rs1) / pi | Zarctrigpi |
151 FSINH | hyperbolic sin (radians) | rd = sinh(rs1) | Zfhyp |
152 FCOSH | hyperbolic cos (radians) | rd = cosh(rs1) | Zfhyp |
153 FTANH | hyperbolic tan (radians) | rd = tanh(rs1) | Zfhyp |
154 FASINH | inverse hyperbolic sin | rd = asinh(rs1) | Zfhyp |
155 FACOSH | inverse hyperbolic cos | rd = acosh(rs1) | Zfhyp |
156 FATANH | inverse hyperbolic tan | rd = atanh(rs1) | Zfhyp |
157 """]]
158
159 # Synthesis, Pseudo-code ops and macro-ops
160
161 The pseudo-ops are best left up to the compiler rather than being actual
162 pseudo-ops, by allocating one scalar FP register for use as a constant
163 (loop invariant) set to "1.0" at the beginning of a function or other
164 suitable code block.
165
166 * FRCP rd, rs1 - pseudo-code alias for rd = 1.0 / rs1
167 * FATAN - pseudo-code alias for rd = atan2(rs1, 1.0) - FATAN2
168 * FATANPI - pseudo alias for rd = atan2pi(rs1, 1.0) - FATAN2PI
169 * FSINCOS - fused macro-op between FSIN and FCOS (issued in that order).
170 * FSINCOSPI - fused macro-op between FSINPI and FCOSPI (issued in that order).
171
172 FATANPI example pseudo-code:
173
174 lui t0, 0x3F800 // upper bits of f32 1.0
175 fmv.x.s ft0, t0
176 fatan2pi.s rd, rs1, ft0
177
178 Hyperbolic function example (obviates need for Zfhyp except for high-performance or correctly-rounding):
179
180 ASINH( x ) = ln( x + SQRT(x**2+1))
181
182 LOG / LOGP1 example:
183
184 LOG(x) = LOGP1(x - 1.0)
185 EXP(x) = EXPM1(x) + 1.0
186
187 # To evaluate: should LOG be replaced with LOG1P (and EXP with EXPM1)?
188
189 RISC principle says "exclude LOG because it's covered by LOGP1 plus an ADD".
190 Research needed to ensure that implementors are not compromised by such
191 a decision
192 <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002358.html>
193
194 correctly-rounded LOG will return different results than LOGP1 and ADD.
195 Likewise for EXP and EXPM1
196