* <http://bugs.libre-riscv.org/show_bug.cgi?id=127>
* <https://www.khronos.org/registry/spir-v/specs/unified1/OpenCL.ExtendedInstructionSet.100.html>
+* Discussion: <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002342.html>
* [[rv_major_opcode_1010011]] for opcode listing.
+* [[zfpacc_proposal]] for accuracy settings proposal
Extension subsets:
* **Zftrans**: standard transcendentals (best suited to 3D)
-* **ZftransExt**: extra functions (useful, not generally needed for 3D, can be synthesised using Ztrans)
+* **ZftransExt**: extra functions (useful, not generally needed for 3D,
+ can be synthesised using Ztrans)
* **Ztrigpi**: trig. xxx-pi sinpi cospi tanpi
* **Ztrignpi**: trig non-xxx-pi sin cos tan
* **Zarctrigpi**: arc-trig. a-xxx-pi: atan2pi asinpi acospi
* **Zarctrignpi**: arc-trig. non-a-xxx-pi: atan2, asin, acos
-* **Zfhyp**: hyperbolic/inverse-hyperbolic. sinh, cosh, tanh, asinh, acosh, atanh
+* **Zfhyp**: hyperbolic/inverse-hyperbolic. sinh, cosh, tanh, asinh,
+ acosh, atanh (can be synthesised - see below)
* **ZftransAdv**: much more complex to implement in hardware
+* **Zfrsqrt**: Reciprocal square-root.
Minimum recommended requirements for 3D: Zftrans, Ztrigpi, Zarctrigpi,
Zarctrignpi
[[!toc levels=2]]
+# TODO:
+
+* Decision on accuracy, moved to [[zfpacc_proposal]]
+<http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002355.html>
+* Errors **MUST** be repeatable.
+* How about four Platform Specifications? 3DUNIX, UNIX, 3DEmbedded and Embedded?
+<http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002361.html>
+ Accuracy requirements for dual (triple) purpose implementations must
+ meet the higher standard.
+* Reciprocal Square-root is in its own separate extension (Zfrsqrt) as
+ it is desirable on its own by other implementors. This to be evaluated.
+
+# Proposed Opcodes vs Khronos OpenCL Opcodes <a name="khronos_equiv"></a>
+
+This list shows the (direct) equivalence between proposed opcodes and
+their Khronos OpenCL equivalents.
+
+See
+<https://www.khronos.org/registry/spir-v/specs/unified1/OpenCL.ExtendedInstructionSet.100.html>
+
+Special FP16 opcodes are *not* being proposed, except by indirect / inherent
+use of the "fmt" field that is already present in the RISC-V Specification.
+
+"Native" opcodes are *not* being proposed: implementors will be expected
+to use the (equivalent) proposed opcode covering the same function.
+
+"Fast" opcodes are *not* being proposed, because the Khronos Specification
+fast\_length, fast\_normalise and fast\_distance OpenCL opcodes require
+vectors (or can be done as scalar operations using other RISC-V instructions).
+
+The OpenCL FP32 opcodes are **direct** equivalents to the proposed opcodes.
+Deviation from conformance with the Khronos Specification - including the
+Khronos Specification accuracy requirements - is not an option.
+
+[[!table data="""
+Proposed opcode | OpenCL FP32 | OpenCL FP16 | OpenCL native | OpenCL fast |
+FSIN | sin | half\_sin | native\_sin | NONE |
+FCOS | cos | half\_cos | native\_cos | NONE |
+FTAN | tan | half\_tan | native\_tan | NONE |
+FASIN | asin | NONE | NONE | NONE |
+FACOS | acos | NONE | NONE | NONE |
+FSINPI | sinpi | NONE | NONE | NONE |
+FCOSPI | cospi | NONE | NONE | NONE |
+FTANPI | tanpi | NONE | NONE | NONE |
+FASINPI | asinpi | NONE | NONE | NONE |
+FACOSPI | acospi | NONE | NONE | NONE |
+FATANPI | atanpi | NONE | NONE | NONE |
+FSINH | sinh | NONE | NONE | NONE |
+FCOSH | cosh | NONE | NONE | NONE |
+FTANH | tanh | NONE | NONE | NONE |
+FASINH | asinh | NONE | NONE | NONE |
+FACOSH | acosh | NONE | NONE | NONE |
+FATANH | atanh | NONE | NONE | NONE |
+FRSQRT | rsqrt | half\_rsqrt | native\_rsqrt | NONE |
+FCBRT | cbrt | NONE | NONE | NONE |
+FEXP2 | exp2 | half\_exp2 | native\_exp2 | NONE |
+FLOG2 | log2 | half\_log2 | native\_log2 | NONE |
+FEXPM1 (1) | expm1 | NONE | NONE | NONE |
+FLOG1P (1) | log1p | NONE | NONE | NONE |
+FEXP (1) | exp | half\_exp | native\_exp | NONE |
+FLOG (1) | log | half\_log | native\_log | NONE |
+FEXP10 | exp10 | half\_exp10 | native\_exp10 | NONE |
+FLOG10 | log10 | half\_log10 | native\_log10 | NONE |
+FATAN2 | atan2 | NONE | NONE | NONE |
+FATAN2PI | atan2pi | NONE | NONE | NONE |
+FPOW | pow | NONE | NONE | NONE |
+FROOT | rootn | NONE | NONE | NONE |
+FHYPOT | hypot | NONE | NONE | NONE |
+"""]]
+
+Note (1): See "synthesis", below. FEXPM1, FEXP and FLOG1P, FLOG, may
+be synthesised in terms of the other. FEXPM1 and FLOG1P are more accurate.
+It is likely therefore that FLOG and FEXP will be removed.
+
# List of 2-arg opcodes
[[!table data="""
FATAN2PI | atan arc tangent / pi | rd = atan2(rs2, rs1) / pi | Zarctrigpi |
FPOW | x power of y | rd = pow(rs1, rs2) | ZftransAdv |
FROOT | x power 1/y | rd = pow(rs1, 1/rs2) | ZftransAdv |
-FHYPOT | hypotenuse | rd = sqrt(x^2 + y^2) | Zftrans |
+FHYPOT | hypotenuse | rd = sqrt(rs1^2 + rs2^2) | Zftrans |
"""]]
# List of 1-arg transcendental opcodes
[[!table data="""
opcode | Description | pseudo-code | Extension |
+FRSQRT | Reciprocal Square-root | rd = sqrt(rs1) | Zfrsqrt |
FCBRT | Cube Root | rd = pow(rs1, 3) | Zftrans |
FEXP2 | power-of-2 | rd = pow(2, rs1) | Zftrans |
FLOG2 | log2 | rd = log2(rs1) | Zftrans |
FATANH | inverse hyperbolic tan | rd = atanh(rs1) | Zfhyp |
"""]]
-# Pseudo-code ops and macro-ops
+# Synthesis, Pseudo-code ops and macro-ops
The pseudo-ops are best left up to the compiler rather than being actual
pseudo-ops, by allocating one scalar FP register for use as a constant
fmv.x.s ft0, t0
fatan2pi.s rd, rs1, ft0
+Hypotenuse example (obviates need for Zfhyp except for high-performance):
+
+ ASINH( x ) = ln( x + SQRT(x**2+1)
+
+LOG / LOGP1 example:
+
+ LOG(x) = LOGP1(x) + 1.0
+ EXP(x) = EXPM1(x-1.0)
+
+# To evaluate: should LOG be replaced with LOG1P (and EXP with EXPM1)?
+
+RISC principle says "exclude LOG because it's covered by LOGP1 plus an ADD".
+Research needed to ensure that implementors are not compromised by such
+a decision
+<http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002358.html>
+