From c32dff46f19b11c1b123dd70f0839159ee231f55 Mon Sep 17 00:00:00 2001 From: Luke Kenneth Casson Leighton Date: Fri, 9 Aug 2019 08:42:54 +0100 Subject: [PATCH] reformat Zfpacc --- zfpacc_proposal.mdwn | 98 ++++++++++++++++++++++++-------------------- 1 file changed, 53 insertions(+), 45 deletions(-) diff --git a/zfpacc_proposal.mdwn b/zfpacc_proposal.mdwn index 68895d7d3..9deb462c1 100644 --- a/zfpacc_proposal.mdwn +++ b/zfpacc_proposal.mdwn @@ -1,55 +1,62 @@ # FP Accuracy proposal -TODO: writeup +TODO: complete writeup * * - A natural place for a standard reduced accuracy extension "Zfpacc" - would be in the reserved bits of FCSR. It could be treated very - similarly to how dynamic frm is treated now. Currently, there are 5 - bits of fflags, 3 bits of frm and 24 Reserved bits. The L (decimal - floating-point) extension will presumably use some, but not all of - them. I'm unable to find any public proposals for L bit encodings - in FCSR. - - For reference, frm is treated as follows: Floating-point operations - use either a static rounding mode encoded in the instruction, or - a dynamic rounding mode held in frm. Rounding modes are encoded - as shown in Table 11.1. A value of 111 in the instruction’s rm - field selects the dynamic rounding mode held in frm. If frm is set - to an invalid value (101–111), any subsequent attempt to execute - a floating-point operation with a dynamic rounding mode will raise - an illegal instruction exception. - - Let's say that we wish to support up to 4 accuracy modes -- 2 'fam' - bits. Default would be IEEE-compliant, encoded as 00. This means - that all current hardware would be compliant with the default mode. - - the unsupported modes would cause a trap to allow emulation where - traps are supported. emulation of unsupported modes would be required - for unix platforms. - - As with frm, an implementation can choose to support any permutation - of dynamic fam-instruction pairs. It will illegal-instruction - trap upon executing an unsupported fam-instruction pair. - The implementation can then emulate the accuracy mode required. - - there would be a mechanism for user mode code to detect which modes - are emulated (csr? syscall?) (if the supervisor decides to make the - emulation visible) that would allow user code to switch to faster - software implementations if it chooses to. - - If the bits are in FCSR, then the switch itself would be exposed - to user mode. User-mode would not be able to detect emulation vs - hardware supported instructions, however (by design). That would - require some platform-specific code. - - Now, which accuracy modes should be included is a question outside - of my expertise and would require a literature review of instruction +Zfpacc: a proposal to allow implementations to dynamically set the bit-accuracy +of results, trading speed (reduced latency) for accuracy (higher latency). + +# Extension of FCSR + +Zfpacc would use some of the the reserved bits of FCSR. It would be treated +very similarly to how dynamic frm is treated. + +frm is treated as follows: + +* Floating-point operations use either a static rounding mode encoded + in the instruction, or a dynamic rounding mode held in frm. +* Rounding modes are encoded as shown in Table 11.1 of the RISC-V ISA Spec +* A value of 111 in the instruction’s rm field selects the dynamic rounding + mode held in frm. If frm is set to an invalid value (101–111), + any subsequent attempt to execute a floating-point operation with a + dynamic rounding mode will raise an illegal instruction exception. + +If we wish to support up to 4 accuracy modes, that would require 2 'fam' +bits. The Default would be IEEE754-compliant, encoded as 00. This means +that all current hardware would be compliant with the default mode. + +Unsupported modes cause a trap to allow emulation where traps are supported. +Emulation of unsupported modes would be required for UNIX platforms. +As with frm, an implementation may choose to support any permutation +of dynamic fam-instruction pairs. It will illegal-instruction trap upon +executing an unsupported fam-instruction pair. The implementation can +then emulate the accuracy mode required. + +If the bits are in FCSR, then the switch itself would be exposed to +user mode. User-mode would not be able to detect emulation vs hardware +supported instructions, however (by design). That would require some +platform-specific code. + +Emulation of unsupported modes would be required for unix platforms. + +TODO: + +A mechanism for user mode code to detect which modes are emulated +(csr? syscall?) (if the supervisor decides to make the emulation visible) +that would allow user code to switch to faster software implementations +if it chooses to. + +TODO: + +Choose which accuracy modes are required + + Which accuracy modes should be included is a question outside of + my expertise and would require a literature review of instruction frequency in key workloads, PPA analysis of simple and advanced - implementations, etc. (Thanks for the insights, Mitch!) + implementations, etc. - emulation of unsupported modes would be required for unix platforms. +TODO: reduced accuracy I don't see why Unix should be required to emulate some arbitrary reduced accuracy ML mode. My guess would be that Unix Platform Spec @@ -60,3 +67,4 @@ TODO: writeup accuracy modes is guaranteed (and therefore does not need discovery sequences), while allowing portable code to execute discovery sequences to detect support for alternative accuracy modes. + -- 2.30.2