+However given that 3D revolves around Standards - DirectX, Vulkan, OpenGL,
+OpenCL - users have much more influence than first appears. Compliance
+with these standards is critical as the userbase (Games writers,
+scientific applications) expects not to have to rewrite extremely large
+and costly codebases to conform with *non-standards-compliant* hardware.
+
+Therefore, compliance with public APIs (Vulkan, OpenCL, OpenGL, DirectX)
+is paramount, and compliance with Trademarked Standards is critical.
+Any deviation from Trademarked Standards means that an implementation
+may not be sold and also make a claim of being, for example, "Vulkan
+compatible".
+
+For 3D, this in turn reinforces and makes a hard requirement a need for public
+compliance with such standards, over-and-above what would otherwise be
+set by a RISC-V Standards Development Process, including both the
+software compliance and the knock-on implications that has for hardware.
+
+For libraries such as libm and numpy, accuracy is paramount, for software interoperability across multiple platforms. Some algorithms critically rely on correct IEEE754, for example.
+The conflicting accuracy requirements can be met through the zfpacc extension.
+
+**Collaboration**:
+
+The case for collaboration on any Extension is already well-known.
+In this particular case, the precedent for inclusion of Transcendentals
+in other ISAs, both from Graphics and High-performance Computing, has
+these primitives well-established in high-profile software libraries and
+compilers in both GPU and HPC Computer Science divisions. Collaboration
+and shared public compliance with those standards brooks no argument.
+
+The combined requirements of collaboration and multi accuracy requirements
+mean that *overall this proposal is categorically and wholly unsuited
+to relegation of "custom" status*.
+
+# Quantitative Analysis <a name="analysis"></a>
+
+This is extremely challenging. Normally, an Extension would require full,
+comprehensive and detailed analysis of every single instruction, for every
+single possible use-case, in every single market. The amount of silicon
+area required would be balanced against the benefits of introducing extra
+opcodes, as well as a full market analysis performed to see which divisions
+of Computer Science benefit from the introduction of the instruction,
+in each and every case.
+
+With 34 instructions, four possible Platforms, and sub-categories of
+implementations even within each Platform, over 136 separate and distinct
+analyses is not a practical proposition.
+
+A little more intelligence has to be applied to the problem space,
+to reduce it down to manageable levels.
+
+Fortunately, the subdivision by Platform, in combination with the
+identification of only two primary markets (Numerical Computation and
+3D), means that the logical reasoning applies *uniformly* and broadly
+across *groups* of instructions rather than individually, making it a primarily
+hardware-centric and accuracy-centric decision-making process.
+
+In addition, hardware algorithms such as CORDIC can cover such a wide
+range of operations (simply by changing the input parameters) that the
+normal argument of compromising and excluding certain opcodes because they
+would significantly increase the silicon area is knocked down.
+
+However, CORDIC, whilst space-efficient, and thus well-suited to
+Embedded, is an old iterative algorithm not well-suited to High-Performance
+Computing or Mid to High-end GPUs, where commercially-competitive
+FP32 pipeline lengths are only around 5 stages.
+
+Not only that, but some operations such as LOG1P, which would normally
+be excluded from one market (due to there being an alternative macro-op
+fused sequence replacing it) are required for other markets due to
+the higher accuracy obtainable at the lower range of input values when
+compared to LOG(1+P).
+
+(Thus we start to see why "proprietary" markets are excluded from this
+proposal, because "proprietary" markets would make *hardware*-driven
+optimisation decisions that would be completely inappropriate for a
+common standard).
+
+ATAN and ATAN2 is another example area in which one market's needs
+conflict directly with another: the only viable solution, without compromising
+one market to the detriment of the other, is to provide both opcodes
+and let implementors make the call as to which (or both) to optimise,
+at the *hardware* level.
+
+Likewise it is well-known that loops involving "0 to 2 times pi", often
+done in subdivisions of powers of two, are costly to do because they
+involve floating-point multiplication by PI in each and every loop.
+3D GPUs solved this by providing SINPI variants which range from 0 to 1
+and perform the multiply *inside* the hardware itself. In the case of
+CORDIC, it turns out that the multiply by PI is not even needed (is a
+loop invariant magic constant).
+
+However, some markets may not wish to *use* CORDIC, for reasons mentioned
+above, and, again, one market would be penalised if SINPI was prioritised
+over SIN, or vice-versa.
+
+In essence, then, even when only the two primary markets (3D and
+Numerical Computation) have been identified, this still leaves two
+(three) diametrically-opposed *accuracy* sub-markets as the prime
+conflict drivers:
+
+* Embedded Ultra Low Power
+* IEEE754 compliance
+* Khronos Vulkan compliance
+
+Thus the best that can be done is to use Quantitative Analysis to work
+out which "subsets" - sub-Extensions - to include, provide an additional
+"accuracy" extension, be as "inclusive" as possible, and thus allow
+implementors to decide what to add to their implementation, and how best
+to optimise them.
+
+This approach *only* works due to the uniformity of the function space,
+and is **not** an appropriate methodology for use in other Extensions
+with huge (non-uniform) market diversity even with similarly large
+numbers of potential opcodes. BitManip is the perfect counter-example.
+
+# Proposed Opcodes vs Khronos OpenCL vs IEEE754-2019<a name="khronos_equiv"></a>
+
+This list shows the (direct) equivalence between proposed opcodes,
+their Khronos OpenCL equivalents, and their IEEE754-2019 equivalents.
+98% of the opcodes in this proposal that are in the IEEE754-2019 standard
+are present in the Khronos Extended Instruction Set.
+
+For RISCV opcode encodings see
+[[rv_major_opcode_1010011]]
+
+See
+<https://www.khronos.org/registry/spir-v/specs/unified1/OpenCL.ExtendedInstructionSet.100.html>
+and <https://ieeexplore.ieee.org/document/8766229>
+
+* Special FP16 opcodes are *not* being proposed, except by indirect / inherent
+ use of the "fmt" field that is already present in the RISC-V Specification.
+* "Native" opcodes are *not* being proposed: implementors will be expected
+ to use the (equivalent) proposed opcode covering the same function.
+* "Fast" opcodes are *not* being proposed, because the Khronos Specification
+ fast\_length, fast\_normalise and fast\_distance OpenCL opcodes require
+ vectors (or can be done as scalar operations using other RISC-V instructions).
+
+The OpenCL FP32 opcodes are **direct** equivalents to the proposed opcodes.
+Deviation from conformance with the Khronos Specification - including the
+Khronos Specification accuracy requirements - is not an option, as it
+results in non-compliance, and the vendor may not use the Trademarked words
+"Vulkan" etc. in conjunction with their product.
+
+IEEE754-2019 Table 9.1 lists "additional mathematical operations".
+Interestingly the only functions missing when compared to OpenCL are
+compound, exp2m1, exp10m1, log2p1, log10p1, pown (integer power) and powr.
+
+[[!table data="""
+opcode | OpenCL FP32 | OpenCL FP16 | OpenCL native | OpenCL fast | IEEE754 |
+FSIN | sin | half\_sin | native\_sin | NONE | sin |
+FCOS | cos | half\_cos | native\_cos | NONE | cos |
+FTAN | tan | half\_tan | native\_tan | NONE | tan |
+NONE (1) | sincos | NONE | NONE | NONE | NONE |
+FASIN | asin | NONE | NONE | NONE | asin |
+FACOS | acos | NONE | NONE | NONE | acos |
+FATAN | atan | NONE | NONE | NONE | atan |
+FSINPI | sinpi | NONE | NONE | NONE | sinPi |
+FCOSPI | cospi | NONE | NONE | NONE | cosPi |
+FTANPI | tanpi | NONE | NONE | NONE | tanPi |
+FASINPI | asinpi | NONE | NONE | NONE | asinPi |
+FACOSPI | acospi | NONE | NONE | NONE | acosPi |
+FATANPI | atanpi | NONE | NONE | NONE | atanPi |
+FSINH | sinh | NONE | NONE | NONE | sinh |
+FCOSH | cosh | NONE | NONE | NONE | cosh |
+FTANH | tanh | NONE | NONE | NONE | tanh |
+FASINH | asinh | NONE | NONE | NONE | asinh |
+FACOSH | acosh | NONE | NONE | NONE | acosh |
+FATANH | atanh | NONE | NONE | NONE | atanh |
+FATAN2 | atan2 | NONE | NONE | NONE | atan2 |
+FATAN2PI | atan2pi | NONE | NONE | NONE | atan2pi |
+FRSQRT | rsqrt | half\_rsqrt | native\_rsqrt | NONE | rSqrt |
+FCBRT | cbrt | NONE | NONE | NONE | NONE (2) |
+FEXP2 | exp2 | half\_exp2 | native\_exp2 | NONE | exp2 |
+FLOG2 | log2 | half\_log2 | native\_log2 | NONE | log2 |
+FEXPM1 | expm1 | NONE | NONE | NONE | expm1 |
+FLOG1P | log1p | NONE | NONE | NONE | logp1 |
+FEXP | exp | half\_exp | native\_exp | NONE | exp |
+FLOG | log | half\_log | native\_log | NONE | log |
+FEXP10 | exp10 | half\_exp10 | native\_exp10 | NONE | exp10 |
+FLOG10 | log10 | half\_log10 | native\_log10 | NONE | log10 |
+FPOW | pow | NONE | NONE | NONE | pow |
+FPOWN | pown | NONE | NONE | NONE | pown |
+FPOWR | powr | half\_powr | native\_powr | NONE | powr |
+FROOTN | rootn | NONE | NONE | NONE | rootn |
+FHYPOT | hypot | NONE | NONE | NONE | hypot |
+FRECIP | NONE | half\_recip | native\_recip | NONE | NONE (3) |
+NONE | NONE | NONE | NONE | NONE | compound |
+NONE | NONE | NONE | NONE | NONE | exp2m1 |
+NONE | NONE | NONE | NONE | NONE | exp10m1 |
+NONE | NONE | NONE | NONE | NONE | log2p1 |
+NONE | NONE | NONE | NONE | NONE | log10p1 |
+"""]]
+
+Note (1) FSINCOS is macro-op fused (see below).
+
+Note (2) synthesised in IEEE754-2019 as "pown(x, 3)"
+
+Note (3) synthesised in IEEE754-2019 using "1.0 / x"
+
+## List of 2-arg opcodes