openpower/sv/rfc/ls002/discussion.mdwn

   1 # Links
   2
   3 * [[sv/int_fp_mv]]
   4
   5 # Questions (09 oct 2022)
   6
   7 **
   8 1. What is "BF16"?  It seems not to be mentioned in the architecture spec.
   9    The architecture spec (VSX chapter) defines two 16-bit binary FP formats.
  10    Judging by the way the RFC uses "BF16", I think it means what the VSX
  11    chapter calls "bfloat16", which has the exponent in the same bits as
  12    single format.  This should be clarified, and the corresponding format
  13    will need to be defined in Section 4.3.1 (Data Format).
  14 **
  15
  16 BF16 seems to be an equally commonly used term for bfloat16, yes.
  17
  18 **
  19 2. For fishmv, what happens if the value supplied in the FPR is not
  20    representable in single format?
  21 **
  22
  23 I'm assuming you're asking what happens if something like `f3 = 0x0080_0000_0000_0001` and `fishmv f3, 0xABCD` is executed:
  24 Exactly the same thing as if the FPR value isn't representable in f32 format for stfs -- the value stored is defined by the `SINGLE` pseudo-code function, no fp status bits are set. Likewise, the input f32 value for fishmv is determined by the `SINGLE` pseudo-code function, no fp status bits are set, fishmv then replaces the lower 16 bits of the f32 value with the immediate, then converts the resulting f32 back to f64 using `DOUBLE` and stores it in FRT.
  25
  26 **
  27 3. The first clause of the verbal description of fishmv seems to assume
  28    that the contents of the specified register were produced by fmvis.
  29    Is there any other use of fishmv?  If yes, the verbal description should
  30    be generalized.  If no, the wording should be explicit about this use.
  31 **
  32
  33 given that the bits are spread out in `DOUBLE()` format it seems unlikely.
  34 if the bits were placed contiguously (sequentially) then it would indeed
  35 be a different matter: temporary storage for constants to be transferred
  36 directly (unmodified) to GPRs for example.  but DOUBLE() formatting
  37 makes that not possible unfortunately.
  38
  39 **
  40 4. The instruction names and mnemonics should be more consistent with the
  41    architecture spec.  In particular, the architecture spec tends to use
  42    "Move" for instructions that transfer data between registers.  Here are
  43    two approaches.
  44 **
  45
  46 ```
  47    a. Model the instructions on li (Load Immediate), an extended mnemonic for
  48       addi.
  49         fmvis --> Floating Load Immediate Single (flis)
  50         fishmv --> Floating Load Immediate Single Lower (flisl)
  51       Under this approach the new instructions would belong in their own
  52       3-level section, after Section 4.6.4 (Floating-Point Load and Store
  53       Double Pair Instructions).
  54
  55    b. Model the instructions on lxvkq (and the existing FP Load instructions)
  56         fmvis --> Load Floating-Point Single Immediate (lfsi)
  57         fishmv --> Load Floating-Point Single Immediate Lower (lfsil)
  58       Under this approach the new instructions would belong in Section 4.6.2
  59       (Floating-Point Load Instructions), with the Load Floating-Point
  60       Single instructions.
  61
  62    I prefer (a), because I think it's confusing to treat these instructions,
  63    which don't access storage, like instructions that do access storage.
  64 ```
  65
  66 the fact that they bypass D-Cache and correspondingly raise no flags or
  67 exceptions is the connection to `ld`.  despite that i like (a) as well
  68 although for purely non-technical reasons (more "memorable") i (Luke) do love
  69 the two mnemonics `flis fishmv` :)
  70
  71 we picked "s" on the end of `fmvis` (`flis`) because it is "shifted"
  72 (like `oris`), not "single". this was accidentally left out of the initial RFC submission.
  73
  74 An alternative suggestion by Jacob Lifshay is to name them: `flis` (fp load immediate shifted) and `fli2` (fp load immediate 2nd-part), with possible (though unlikely) `fli3`/`fli4` to load the rest of the bits needed for a f64.
  75
  76 Other:
  77
  78 **
  79 1. The RFC should be based on the current version of the architecture,
  80    which is V. 3.1B.  I believe this has no effect on the substance of the
  81    RFC.  But it affects the identities of the instruction-list appendices,
  82   which in V. 3.1B are E, F, G, and H.
  83 **
  84
  85 acknowledged.  will edit. done.
  86
  87 **
  88 2. Additional affected sections are 1.6.1.6 (additional line for DX-form),
  89    1.6.2 (additional use for d0,d1,d2), and Appendix D (Opcode Maps).
  90 **
  91
  92 ditto. TODO.
  93
  94 missed the addition to 1.6.1.6 (DX-Form).  TODO
  95
  96 **
  97 3. Does the last line of the Summary apply to both instructions or just to
  98    fishmv?  I can see why you would want a prefixed version of fmvis, which
  99    would supply the entire 32-bit FP single format value and avoid the need
 100    for fishmv.  Why would you want a prefixed version of fishmv?
 101 **
 102
 103 the analysis counting instructions and D-Cache Loads actually shows
 104 that whilst the initial idea for `pfmvis` would be to fill in the
 105 remaining mantissa and high exponent bits to complete a full FP64,
 106 the cost of doing so is:
 107
 108 * 1x64 pflis
 109 * 1x64 pfishmv
 110
 111 which totals to 16 bytes loaded which is actually *more* than just `lfd`,
 112 which is only 4 + 8 bytes.  the only technical reason therefore is
 113 to avoid D-Cache entirely, just like the 5-instruction sequence
 114 that writes a 64-bit GPR only from immediates
 115 (li, oris, rldicl, li, oris) although that is justifiable
 116 as a critical means of bootstrapping (constructing 64 bit addresses)
 117
 118 **
 119 4. The Motivation says "Even clearing an FPR to zero presently requires Load".
 120    What about fsub FRT,FRA,FRA?
 121 **
 122
 123 That doesn't actually clear FRT to zero because NaN - NaN = Inf - Inf = NaN, not zero. Also, with round to -inf, 0 - 0 produces -0, not 0.
 124
 125 **
 126 5. "FRS" for both instructions should be changed to "FRT".  ("FRS" normally
 127    specifies a source register; see Section 1.6.2.  I understand that for
 128    fishmv the specified register is both source and target.  But "TX,T"
 129    provides precedent for using the "target form" of register specification
 130    for such cases.)
 131 6. The RTL for fmvis should use left arrow for assignment.
 132 **
 133
 134 RTL error corrected. ack on FRT.
 135
 136 **
 137 7. The architecture spec (VSX chapter) uses "BFP32" and "BFP64", and the
 138    lower-case versions thereof, for the 32-bit and 64-bit binary FP formats.
 139    The RFC's "FP32" and "FP64" (and lower case of same) should be made
 140    consistent with this usage.
 141 **
 142
 143 acknowledged. TODO.
 144
 145 **
 146 8. More generally, the style of the verbal description for both instructions
 147    should be made more consistent with the style used in the architecture
 148    spec.
 149 **
 150
 151 yes Paul kindly gave advice on that.
 152
 153 **
 154 9. In the first clause of the verbal description of fishmv I think "inserted
 155    into FRS" should be "inserted into the low-order half of the single-
 156    format value corresponding to the contents of FRT".
 157    A similar change should be made in the second sentence of the next
 158    paragraph.
 159 **
 160
 161 ack. TODO.
 162
 163 **
 164 10. The paragraph before the Programming Note in the fishmv description
 165    says "This is strategically similar to how li combined with oris is used
 166    to construct 32-bit Integers".  li combined with oris works only if bit 16
 167    of the desired 32-bit integer is 0.  (A better way to construct a 32-bit
 168    integer is to use pli (extended mnemonic for paddi).)
 169 **
 170
 171 it is unlikely that we (Libre-SOC) will initially implement any of v3.1
 172 64-bit prefixing (it cannot be Vectorised, resulting unacceptably in
 173 96-bit instructions which we decided is too much). that said, the LD
 174 addressing immediate extended range is extremely useful (along with the PC-relative modes and also other instructions such as paddi).
 175
 176 bottom line we have not yet given much thought to using any v3.1 Scalar
 177 Prefixed instructions, at all, so don't even know most of what they do.