From c605bd3e2ea172a47de2bec63160771f2361ff25 Mon Sep 17 00:00:00 2001 From: Luke Kenneth Casson Leighton Date: Fri, 9 Aug 2019 05:38:28 +0100 Subject: [PATCH] add cf to zfaccuracy proposal --- zfpacc_proposal.mdwn | 61 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 61 insertions(+) create mode 100644 zfpacc_proposal.mdwn diff --git a/zfpacc_proposal.mdwn b/zfpacc_proposal.mdwn new file mode 100644 index 000000000..23feec9c1 --- /dev/null +++ b/zfpacc_proposal.mdwn @@ -0,0 +1,61 @@ +# FP Accuracy proposal + +TODO: writeup + + + A natural place for a standard reduced accuracy extension "Zfpacc" + would be in the reserved bits of FCSR. It could be treated very + similarly to how dynamic frm is treated now. Currently, there are 5 + bits of fflags, 3 bits of frm and 24 Reserved bits. The L (decimal + floating-point) extension will presumably use some, but not all of + them. I'm unable to find any public proposals for L bit encodings + in FCSR. + + For reference, frm is treated as follows: Floating-point operations + use either a static rounding mode encoded in the instruction, or + a dynamic rounding mode held in frm. Rounding modes are encoded + as shown in Table 11.1. A value of 111 in the instruction’s rm + field selects the dynamic rounding mode held in frm. If frm is set + to an invalid value (101–111), any subsequent attempt to execute + a floating-point operation with a dynamic rounding mode will raise + an illegal instruction exception. + + Let's say that we wish to support up to 4 accuracy modes -- 2 'fam' + bits. Default would be IEEE-compliant, encoded as 00. This means + that all current hardware would be compliant with the default mode. + + the unsupported modes would cause a trap to allow emulation where + traps are supported. emulation of unsupported modes would be required + for unix platforms. + + As with frm, an implementation can choose to support any permutation + of dynamic fam-instruction pairs. It will illegal-instruction + trap upon executing an unsupported fam-instruction pair. + The implementation can then emulate the accuracy mode required. + + there would be a mechanism for user mode code to detect which modes + are emulated (csr? syscall?) (if the supervisor decides to make the + emulation visible) that would allow user code to switch to faster + software implementations if it chooses to. + + If the bits are in FCSR, then the switch itself would be exposed + to user mode. User-mode would not be able to detect emulation vs + hardware supported instructions, however (by design). That would + require some platform-specific code. + + Now, which accuracy modes should be included is a question outside + of my expertise and would require a literature review of instruction + frequency in key workloads, PPA analysis of simple and advanced + implementations, etc. (Thanks for the insights, Mitch!) + + emulation of unsupported modes would be required for unix platforms. + + I don't see why Unix should be required to emulate some arbitrary + reduced accuracy ML mode. My guess would be that Unix Platform Spec + requires support for IEEE, whereas arbitrary ML platform requires + support for Mode XYZ. Of course, implementations of either platform + would be free to support any/all modes that they find valuable. + Compiling for a specific platform means that support for required + accuracy modes is guaranteed (and therefore does not need discovery + sequences), while allowing portable code to execute discovery + sequences to detect support for alternative accuracy modes. -- 2.30.2