openpower/sv/rfc/ls002/discussion.mdwn

   1 # Links
   2
   3 * [[sv/int_fp_mv]]
   4
   5 # v3.1 Prefixed instructions
   6
   7 **PREFIXED INSTRUCTIONS ARE 100% OUT OF SCOPE OF THIS RFC**.
   8
   9 please do not extend the scope of this RFC beyond the two
  10 32-bit instructions.
  11
  12 # Questions (09 oct 2022)
  13
  14 **Substantive or semi-substantive:**
  15
  16 **
  17 1. What is "BF16"?  It seems not to be mentioned in the architecture spec.
  18    The architecture spec (VSX chapter) defines two 16-bit binary FP formats.
  19    Judging by the way the RFC uses "BF16", I think it means what the VSX
  20    chapter calls "bfloat16", which has the exponent in the same bits as
  21    single format.  This should be clarified, and the corresponding format
  22    will need to be defined in Section 4.3.1 (Data Format).
  23 **
  24
  25 BF16 seems to be an equally commonly used term for bfloat16, yes.
  26 done, added.
  27
  28 **
  29 2. For fishmv, what happens if the value supplied in the FPR is not
  30    representable in single format?
  31 **
  32
  33 I'm assuming you're asking what happens if something like `f3 = 0x0080_0000_0000_0001` and `fishmv f3, 0xABCD` is executed:
  34 Exactly the same thing as if the FPR value isn't representable in f32 format for stfs -- the value stored is defined by the `SINGLE` pseudo-code function, no fp status bits are set. Likewise, the input f32 value for fishmv is determined by the `SINGLE` pseudo-code function, no fp status bits are set, fishmv then replaces the lower 16 bits of the f32 value with the immediate, then converts the resulting f32 back to f64 using `DOUBLE` and stores it in FRT.
  35
  36 Ultimately, these are immediates, statically-compiled.  if the developer
  37 wants "invalid" data, statically-compiled into a binary, it is reasonable
  38 to assume they have good reasons for doing so.
  39
  40 **
  41 3. The first clause of the verbal description of fishmv seems to assume
  42    that the contents of the specified register were produced by fmvis.
  43    Is there any other use of fishmv?  If yes, the verbal description should
  44    be generalized.  If no, the wording should be explicit about this use.
  45 **
  46
  47 given that the bits are spread out in `DOUBLE()` format it seems unlikely.
  48 if the bits were placed contiguously (sequentially) then it would indeed
  49 be a different matter: temporary storage for constants to be transferred
  50 directly (unmodified) to GPRs for example.  but DOUBLE() formatting
  51 makes that not possible unfortunately.
  52
  53 however alternative uses by programmers cannot be ruled out. it may
  54 be the case that despite the format being DOUBLE() there is in fact
  55 an FPR->GPR transfer instruction that can at least get the 32-bits
  56 of immediate back out as a contiguous undamaged block.  thus adding
  57 notes that may turn out to be restrictive is inadviseable.
  58
  59 additional note: DOUBLE() has been noted to perform normalisation.
  60 this would make alternative uses even more unlikely.
  61
  62 **
  63 4. The instruction names and mnemonics should be more consistent with the
  64    architecture spec.  In particular, the architecture spec tends to use
  65    "Move" for instructions that transfer data between registers.  Here are
  66    two approaches.
  67 **
  68
  69 ```
  70    a. Model the instructions on li (Load Immediate), an extended mnemonic for
  71       addi.
  72         fmvis --> Floating Load Immediate Single (flis)
  73         fishmv --> Floating Load Immediate Single Lower (flisl)
  74       Under this approach the new instructions would belong in their own
  75       3-level section, after Section 4.6.4 (Floating-Point Load and Store
  76       Double Pair Instructions).
  77
  78    b. Model the instructions on lxvkq (and the existing FP Load instructions)
  79         fmvis --> Load Floating-Point Single Immediate (lfsi)
  80         fishmv --> Load Floating-Point Single Immediate Lower (lfsil)
  81       Under this approach the new instructions would belong in Section 4.6.2
  82       (Floating-Point Load Instructions), with the Load Floating-Point
  83       Single instructions.
  84
  85    I prefer (a), because I think it's confusing to treat these instructions,
  86    which don't access storage, like instructions that do access storage.
  87 ```
  88
  89 the fact that they bypass D-Cache and correspondingly raise no flags or
  90 exceptions is the connection to `ld`.  despite that i like (a) as well
  91 although for purely non-technical reasons (more "memorable") i (Luke) do love
  92 the two mnemonics `flis fishmv` :)
  93
  94 we picked "s" on the end of `fmvis` (`flis`) because it is "shifted"
  95 (like `oris`), not "single".
  96
  97 **Other:**
  98
  99 **
 100 1. The RFC should be based on the current version of the architecture,
 101    which is V. 3.1B.  I believe this has no effect on the substance of the
 102    RFC.  But it affects the identities of the instruction-list appendices,
 103   which in V. 3.1B are E, F, G, and H.
 104 **
 105
 106 acknowledged.  will edit. done v3.1B, done EFGH.
 107
 108 **
 109 2. Additional affected sections are 1.6.1.6 (additional line for DX-form),
 110    1.6.2 (additional use for d0,d1,d2), and Appendix D (Opcode Maps).
 111 **
 112
 113 ditto. done 1.6.2 (FRS)
 114
 115 missed the addition to 1.6.1.6 (DX-Form).  done
 116
 117 **
 118 3. Does the last line of the Summary apply to both instructions or just to
 119    fishmv?  I can see why you would want a prefixed version of fmvis, which
 120    would supply the entire 32-bit FP single format value and avoid the need
 121    for fishmv.  Why would you want a prefixed version of fishmv?
 122 **
 123
 124 the more interesting initial question is, "why no `pflis`?" and
 125 the answer to that is "because flis and fishmv do exactly the same
 126 job in exactly the same amount of bits" (64).
 127 `flis` fills in a BF16, `fishmv` extends to an FP32,
 128 and `pflis` would fill in an FP32 in exactly the same amount
 129 of space, making it a redundant encoding.  this just leaves the
 130 purpose of `pfishmv` to be to extend (fill) an FP32 out to an FP64.
 131
 132 that said: the next phase of whether it is worthwhile is to count the
 133 I/D-Cache usage.
 134 the analysis counting instructions and D-Cache Loads actually shows
 135 that whilst the initial idea for `pfishmv` would be to fill in the
 136 remaining mantissa and high exponent bits to complete a full FP64,
 137 the cost of doing so is:
 138
 139 * 1x32 flis
 140 * 1x32 fishmv
 141 * 1x64 pfishmv
 142
 143 which totals QTY 4of 32-bits (across I-Cache) which is actually *more* than just `lfd`,
 144 which is only QTY 3of 32-bits (across both I-Cache and D-Cache).
 145 the only technical reason therefore is
 146 to avoid D-Cache entirely, just like the 5-instruction sequence
 147 that writes a 64-bit GPR only from immediates
 148 (li, oris, rldicl, li, oris) although that is justifiable
 149 as a critical means of bootstrapping (constructing 64 bit addresses)
 150
 151 **
 152 4. The Motivation says "Even clearing an FPR to zero presently requires Load".
 153    What about fsub FRT,FRA,FRA?
 154 **
 155
 156 That doesn't actually clear FRT to zero because `NaN - NaN` and
 157 `Inf - Inf` both equal `NaN`, not zero. Also, with "round to -inf",
 158 0 - 0 produces -0, not 0.  Thus use of `fsub` is critically
 159 dependent on the contents of registers and status flags, and
 160 would require more instructions, where `flis` is not.
 161
 162 **
 163 5. "FRS" for both instructions should be changed to "FRT".  ("FRS" normally
 164    specifies a source register; see Section 1.6.2.  I understand that for
 165    fishmv the specified register is both source and target.  But "TX,T"
 166    provides precedent for using the "target form" of register specification
 167    for such cases.)
 168 6. The RTL for fmvis should use left arrow for assignment.
 169 **
 170
 171 RTL error corrected. ack on FRT. done.
 172
 173 **
 174 7. The architecture spec (VSX chapter) uses "BFP32" and "BFP64", and the
 175    lower-case versions thereof, for the 32-bit and 64-bit binary FP formats.
 176    The RFC's "FP32" and "FP64" (and lower case of same) should be made
 177    consistent with this usage.
 178 **
 179
 180 acknowledged. done.
 181
 182 **
 183 8. More generally, the style of the verbal description for both instructions
 184    should be made more consistent with the style used in the architecture
 185    spec.
 186 **
 187
 188 yes Paul kindly gave advice on that. done.
 189
 190 **
 191 9. In the first clause of the verbal description of fishmv I think "inserted
 192    into FRS" should be "inserted into the low-order half of the single-
 193    format value corresponding to the contents of FRT".
 194    A similar change should be made in the second sentence of the next
 195    paragraph.
 196 **
 197
 198 ack. done. (actually, removed the duplicate sentence/phrase)
 199
 200 **
 201 10. The paragraph before the Programming Note in the fishmv description
 202    says "This is strategically similar to how li combined with oris is used
 203    to construct 32-bit Integers".  li combined with oris works only if bit 16
 204    of the desired 32-bit integer is 0.  (A better way to construct a 32-bit
 205    integer is to use pli (extended mnemonic for paddi).)
 206 **
 207
 208 it is unlikely that we (Libre-SOC) will initially implement any of v3.1
 209 64-bit prefixing (it cannot be Vectorised, resulting unacceptably in
 210 96-bit instructions which we decided is too much). that said, the LD
 211 addressing immediate extended range is extremely useful
 212 (along with the PC-relative modes and also other instructions
 213 such as paddi).
 214
 215 bottom line we have not yet given much thought to using any v3.1 Scalar
 216 Prefixed instructions, at all, so don't even know most of what they do.
 217
 218 that said: if `paddi` puts 32-bits into a GPR, and does so in 64 bits,
 219 is it not similarly redundant i.e. exactly the same amount of space
 220 used as two 32-bit instructions?  if `paddi` puts *more* than 32 bits
 221 into a GPR then it is not the same and would not make a suitable
 222 comparative analogy as a Programmer's Note.
 223
 224 # Questions (11 Oct)
 225
 226 **Should the use of DOUBLE() be bypassed?**
 227
 228 No, because we specifically want to be able to express all possible f32 values,
 229 including denormal values. those denormal values require normalization to get
 230 the corresponding f64 values.