openpower/sv/rfc/ls002/discussion.mdwn

   1 # Links
   2
   3 * [[sv/int_fp_mv]]
   4
   5 # Questions (09 oct 2022)
   6
   7 **
   8 1. What is "BF16"?  It seems not to be mentioned in the architecture spec.
   9    The architecture spec (VSX chapter) defines two 16-bit binary FP formats.
  10    Judging by the way the RFC uses "BF16", I think it means what the VSX
  11    chapter calls "bfloat16", which has the exponent in the same bits as
  12    single format.  This should be clarified, and the corresponding format
  13    will need to be defined in Section 4.3.1 (Data Format).
  14 **
  15
  16 BF16 seems to be an equally commonly used term for bfloat16, yes.
  17
  18 **
  19 2. For fishmv, what happens if the value supplied in the FPR is not
  20    representable in single format?
  21 **
  22
  23 exactly the same thing as if `fld` were used to load an "unrepresentable"
  24 value: nothing.  if `fld` raised flags or exceptions then so would (should)
  25 `fmvis`.
  26
  27 **
  28 3. The first clause of the verbal description of fishmv seems to assume
  29    that the contents of the specified register were produced by fmvis.
  30    Is there any other use of fishmv?  If yes, the verbal description should
  31    be generalized.  If no, the wording should be explicit about this use.
  32 **
  33
  34 given that the bits are spread out in `DOUBLE()` format it seems unlikely.
  35 if the bits were placed contiguously (sequentially) then it would indeed
  36 be a different matter.
  37
  38 **
  39 4. The instruction names and mnemonics should be more consistent with the
  40    architecture spec.  In particular, the architecture spec tends to use
  41    "Move" for instructions that transfer data between registers.  Here are
  42    two approaches.
  43
  44    a. Model the instructions on li (Load Immediate), an extended mnemonic for
  45       addi.
  46         fmvis --> Floating Load Immediate Single (flis)
  47         fishmv --> Floating Load Immediate Single Lower (flisl)
  48       Under this approach the new instructions would belong in their own
  49       3-level section, after Section 4.6.4 (Floating-Point Load and Store
  50       Double Pair Instructions).
  51
  52    b. Model the instructions on lxvkq (and the existing FP Load instructions)
  53         fmvis --> Load Floating-Point Single Immediate (lfsi)
  54         fishmv --> Load Floating-Point Single Immediate Lower (lfsil)
  55       Under this approach the new instructions would belong in Section 4.6.2
  56       (Floating-Point Load Instructions), with the Load Floating-Point
  57       Single instructions.
  58
  59    I prefer (a), because I think it's confusing to treat these instructions,
  60    which don't access storage, like instructions that do access storage.
  61 **
  62
  63 the fact that they bypass D-Cache and correspondingly raise no flags or
  64 exceptions is the connection to `ld`.  despite that i like (a) as well
  65 although for purely non-technical reasons, more "memorable", i do love
  66 the two mnemonics `flis fishmv` :)
  67
  68 we picked "s" on the end of `fmvis` (`flis`) because it is "shifted"
  69 (like `oris`)
  70
  71 Other:
  72
  73 **
  74 1. The RFC should be based on the current version of the architecture,
  75    which is V. 3.1B.  I believe this has no effect on the substance of the
  76    RFC.  But it affects the identities of the instruction-list appendices,
  77   which in V. 3.1B are E, F, G, and H.
  78 **
  79
  80 ackniwledged.  will edit
  81
  82 **
  83 2. Additional affected sections are 1.6.1.6 (additional line for DX-form),
  84    1.6.2 (additional use for d0,d1,d2), and Appendix D (Opcode Maps).
  85 **
  86
  87 ditto
  88
  89 **
  90 3. Does the last line of the Summary apply to both instructions or just to
  91    fishmv?  I can see why you would want a prefixed version of fmvis, which
  92    would supply the entire 32-bit FP single format value and avoid the need
  93    for fishmv.  Why would you want a prefixed version of fishmv?
  94 **
  95
  96 the analysis counting instructions and D-Cache Loads actually shows
  97 that whilst the initial idea for `pfmvis` would be to fill in the
  98 remaining mantissa and high exponent bits to complete a full FP64,
  99 the cost of doing so is:
 100
 101 * 1x32 flis
 102 * 1x32 fishmv
 103 * 1x64 pfishmv
 104
 105 which is QTY 8 bytes which is actually *more* than just `fld`,
 106 which is only QTY 6 bytes.  the only technical reason therefore
 107 to avoid D-Cache entirely, just like the 5-instruction sequence
 108 that writes a 64-bit GPR only from immediates
 109 (li, oris, rldicl, li, oris)
 110
 111 **
 112 4. The Motivation says "Even clearing an FPR to zero presently requires Load".
 113    What about fsub FRT,FRA,FRA?
 114 **
 115
 116 didn't know about it! although technically that reads registers
 117 (unless micro-code-redirected to an internal zeroing operation)
 118
 119 **
 120 5. "FRS" for both instructions should be changed to "FRT".  ("FRS" normally
 121    specifies a source register; see Section 1.6.2.  I understand that for
 122    fishmv the specified register is both source and target.  But "TX,T"
 123    provides precedent for using the "target form" of register specification
 124    for such cases.)
 125 6. The RTL for fmvis should use left arrow for assignment.
 126 **
 127
 128 **
 129 7. The architecture spec (VSX chapter) uses "BFP32" and "BFP64", and the
 130    lower-case versions thereof, for the 32-bit and 64-bit binary FP formats.
 131    The RFC's "FP32" and "FP64" (and lower case of same) should be made
 132    consistent with this usage.
 133 **
 134
 135 **
 136 8. More generally, the style of the verbal description for both instructions
 137    should be made more consistent with the style used in the architecture
 138    spec.
 139 **
 140
 141 **
 142 9. In the first clause of the verbal description of fishmv I think "inserted
 143    into FRS" should be "inserted into the low-order half of the single-
 144    format value corresponding to the contents of FRT".
 145    A similar change should be made in the second sentence of the next
 146    paragraph.
 147 **
 148
 149 **
 150 10. The paragraph before the Programming Note in the fishmv description
 151    says "This is strategically similar to how li combined with oris is used
 152    to construct 32-bit Integers".  li combined with oris works only if bit 16
 153    of the desired 32-bit integer is 0.  (A better way to construct a 32-bit
 154    integer is to use pli (extended mnemonic for paddi).)
 155 **