* **ZftransAdv**: much more complex to implement in hardware
* **Zfrsqrt**: Reciprocal square-root.
-Minimum recommended requirements for 3D: Zftrans, Ztrigpi, Zarctrigpi,
+Minimum recommended requirements for 3D: Zftrans, Ztrigpi, Ztrignpi, Zarctrigpi,
Zarctrignpi
+Minimum recommended requirements for Mobile-Embedded 3D: Ztrigpi, Zftrans, Ztrignpi
+
# TODO:
* Decision on accuracy, moved to [[zfpacc_proposal]]
**There are *four* different, disparate platform's needs (two new)**:
-* 3D Embedded Platform
+* 3D Embedded Platform (new)
* Embedded Platform
-* 3D UNIX Platform
+* 3D UNIX Platform (new)
* UNIX Platform
**The use-cases are**:
that are completely at odds with the other market at the other end of
the spectrum: Numerical Computation.
-Interoperability in Numerical Computation is absolutely critical: it implies
+Interoperability in Numerical Computation is absolutely critical: it implies (correlates directly with)
IEEE754 compliance. However full IEEE754 compliance automatically and
-inherently penalises a GPU, where accuracy is simply just not necessary.
+inherently penalises a GPU on performance and die area, where accuracy is simply just not necessary.
To meet the needs of both markets, the two new platforms have to be created,
and [[zfpacc_proposal]] is a critical dependency. Runtime selection of
requirements conflict with the HPC world, due to the reduced accuracy.
This on the basis that the silicon die area required for IEEE754 is far
greater than that needed for reduced-accuracy, and thus their product would
-be completely unacceptable in the market.
+be completely unacceptable in the market if it had to meet IEEE754, unnecessarily.
An "Embedded 3D" GPU has radically different performance, power
and die-area requirements (and may even target SoftCores in FPGA).
However given that 3D revolves around Standards - DirectX, Vulkan, OpenGL,
OpenCL - users have much more influence than first appears. Compliance
with these standards is critical as the userbase (Games writers, scientific
-applications) expects not to have to rewrite large codebases to conform
-with non-standards-compliant hardware.
+applications) expects not to have to rewrite extremely large and costly codebases to conform
+with *non-standards-compliant* hardware.
-Therefore, compliance with public APIs is paramount, and compliance with
+Therefore, compliance with public APIs (Vulkan, OpenCL, OpenGL, DirectX) is paramount, and compliance with
Trademarked Standards is critical. Any deviation from Trademarked Standards
means that an implementation may not be sold and also make a claim of being,
for example, "Vulkan compatible".
compilers in both GPU and HPC Computer Science divisions. Collaboration
and shared public compliance with those standards brooks no argument.
-*Overall this proposal is categorically and wholly unsuited to
+The combined requirements of collaboration and multi accuracy requirements mean that
+*overall this proposal is categorically and wholly unsuited to
relegation of "custom" status*.
# Quantitative Analysis <a name="analysis"></a>
# Subsets
+The full set is based on the Khronos OpenCL opcodes. If implemented entirely it would be too much for both Embedded and also 3D.
+
The subsets are organised by hardware complexity, need (3D, HPC), however due to synthesis producing inaccurate results at the range limits, the less common subsets are still required for IEEE754 HPC.
MALI Midgard, an embedded / mobile 3D GPU, for example only has the following opcodes:
These in FP32 and FP16 only: no FP32 hardware, at all.
-Vivante Embedded/Mobile 3D (etnaviv <https://github.com/laanwj/etna_viv/blob/master/rnndb/isa.xml>) has sin, cos, sin2pi, cos2pi, log2, exp, sqrt and rsqrt and recip. It also has fast variants of some of these, as a CSR Mode.
+Vivante Embedded/Mobile 3D (etnaviv <https://github.com/laanwj/etna_viv/blob/master/rnndb/isa.xml>) only has the following:
+
+ sin, cos2pi
+ cos, sin2pi
+ log2, exp
+ sqrt and rsqrt
+ recip.
+
+It also has fast variants of some of these, as a CSR Mode.
Also a general point, that customised optimised hardware targetting FP32 3D with less accuracy simply can neither be used for IEEE754 nor for FP64 (except as a starting point for hardware or software driven Newton Raphson or other iterative method).
## ZftransAdv
-These are simply much more complex to implement in hardware, and typically will only be put into HPC applications.
+Cube-root, Power, Root: these are simply much more complex to implement in hardware, and typically will only be put into HPC applications.
+
+Root is included as well as Power because at the extreme ranges one is more accurate than the other.
* **Zfrsqrt**: Reciprocal square-root.