+[[!tag standards]]
+
# FP Accuracy proposal
+Credits:
+
+* Bruce Hoult
+* Allen Baum
+* Dan Petroski
+* Jacob Lifshay
+
TODO: complete writeup
* <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002400.html>
is permitted to meet alternative accuracy requirements, whilst still
retaining the instruction's requested format.
+This proposal is *only* suitable for adding pre-existing accuracy standards
+where it is clearly established, well in advance of applications being
+written that conform to that standard, that dealing with variations in
+accuracy across hardware implementations is the responsibility of the
+application writer. This is the case for both Vulkan and OpenCL.
+
+This proposal is *not* suitable for inclusion of "de-facto" (proprietary)
+accuracy standards (historic IBM Mainframe vs Ahmdahl incompatibility)
+where there was no prior agreement or notification to applications
+writers that variations in accuracy across hardware implementations
+would occur. In the unlikely event that they *are* ever to be included
+(n the future, rather than as a Custom Extension, then, unlike Vulkan
+and OpenCL, they must **only** be added as "bit-for-bit compatible".
+
# Extension of FCSR
Zfpacc would use some of the the reserved bits of FCSR. It would be treated
| facc | mode | description |
| ----- | ------- | ------------------- |
-| 0b00H | IEEE754 | correctly rounded |
-| 0b01H | ULP<1 | Unit Last Place < 1 |
-| 0b10H | Vulkan | Vulkan compliant |
-| 0b11H | Appx | Machine Learning |
-
-When bit 0 (H) of facc is set to zero, half-precision mode is
-disabled. When set, an automatic down conversion (FCVT) to half the
-instruction bitwidth (FP32 opcode would convert to FP16) on operands
-is performed, followed by the operation occuring at half precision,
-followed by automatic up conversion back to the instruction's bitwidth.
-
-Note that the format of the operands and result remain the same for
-all opcodes. The only change is in the *accuracy* of the result, not
-its format.
+| 0b000 | IEEE754 | correctly rounded |
+| 0b010 | ULP<1 | Unit Last Place < 1 |
+| 0b100 | Vulkan | Vulkan compliant |
+| 0b110 | Appx | Machine Learning
+
+(TODO: review alternative idea: ULP0.5, ULP1, ULP2, ULP4, ULP16)
-Pseudocode for half accuracy mode:
+Notes:
- def fpadd32(op1, op2):
- if FCSR.facc.halfmode:
- op1 = fcvt32to16(op1)
- op2 = fcvt32to16(op2)
- result = fpadd32(op1, op2)
- return fcvt16to32(result)
- else:
- # TODO, reduced accuracy if requested
- return op1 + op2
+* facc=0 to match current RISC-V behaviour, where these bits were formerly reserved and set to zero.
+* The format of the operands and result remain the same for
+all opcodes. The only change is in the *accuracy* of the result, not
+its format.
+* facc sets the *minimum* accuracy. It is acceptable to provide *more* accurate results than is requested by a given facc mode (although, clearly, the opportunity for reduced power and latency would be missed).
## Discussion
- fully-accurate-mode: correctly rounded in all cases
- maybe more modes?
+extra mode suggestions:
+
+ it might be reasonable to add a mode saying you're prepared to accept
+ worse then 0.5 ULP accuracy, perhaps with a few options: 1, 2, 4,
+ 16 or something like that.
+
Question: should better accuracy than is requested be permitted? Example:
Ahmdahl-370 issues.
There are also 8 and 9-bit floating point formats that could be useful
<https://en.wikipedia.org/wiki/Minifloat>
+
+### function accuracy in standards (opencl, vulkan)
+
+[[resources]] for OpenCL and Vulkan
+
+Vulkan requires full ieee754 precision for all F/D instructions except for fdiv and fsqrt.
+
+<https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/chap40.html#spirvenv-precision-operation>
+
+Source is here:
+<https://github.com/KhronosGroup/Vulkan-Docs/blob/master/appendices/spirvenv.txt#L1172>
+
+OpenCL slightly different, suggest adding as an extra entry.
+
+<https://www.khronos.org/registry/OpenCL/specs/2.2/html/OpenCL_Env.html#relative-error-as-ulps>
+
+Link, finds version 2.1 of opencl environment specification, table 8.4.1 however needs checking if it is the same as the above, which has "SPIRV" in it and is 2.2 not 2.1
+
+https://www.google.com/search?q=opencl+environment+specification
+
+2.1 superceded by 2.2
+<https://github.com/KhronosGroup/OpenCL-Docs/blob/master/env/numerical_compliance.asciidoc>
+
+### Compliance
+
+Dan Petroski:
+
+ It’s a bit more complicated than that. Different FP
+ representations/algorithms have different quantization ranges, so you
+ can get more or less precise depending on how large the arguments are.
+
+ For instance, machine A can compute within ULP3 from 0 to 10000, but
+ ULP2 from 10000 upwards. Machine B can compute within ULP2 from 0 to
+ 6000, then ULP3 for 6000+. How do you design a compliance suite which
+ guarantees behavior across all fpaccs?
+
+and from Allen Baum:
+
+ In the example above, you'd need a ratified spec with the defined
+ ranges (possbily per range and per op) - and then implementations
+ would need to at least meet that spec (but could be more accurate)
+
+ so - not impossible, but a lot more work to write different kinds
+ of tests than standard IEEE compatible test would have.
+
+ And, by the way, if you want it to be a ratified spec, it needs a
+ compliance suite, and whoever has defined the spec is responsible
+ for writing it.,
+