From: Luke Kenneth Casson Leighton Date: Thu, 25 May 2023 00:32:43 +0000 (+0100) Subject: forgot move ls002 directory X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=3e9da0e2b36290dd9ca8a0685a72671971ad2a2b;p=libreriscv.git forgot move ls002 directory --- diff --git a/openpower/sv/rfc/ls002.fmi/discussion.mdwn b/openpower/sv/rfc/ls002.fmi/discussion.mdwn new file mode 100644 index 000000000..77705be5b --- /dev/null +++ b/openpower/sv/rfc/ls002.fmi/discussion.mdwn @@ -0,0 +1,230 @@ +# Links + +* [[sv/int_fp_mv]] + +# v3.1 Prefixed instructions + +**PREFIXED INSTRUCTIONS ARE 100% OUT OF SCOPE OF THIS RFC**. + +please do not extend the scope of this RFC beyond the two +32-bit instructions. + +# Questions (09 oct 2022) + +**Substantive or semi-substantive:** + +** +1. What is "BF16"? It seems not to be mentioned in the architecture spec. + The architecture spec (VSX chapter) defines two 16-bit binary FP formats. + Judging by the way the RFC uses "BF16", I think it means what the VSX + chapter calls "bfloat16", which has the exponent in the same bits as + single format. This should be clarified, and the corresponding format + will need to be defined in Section 4.3.1 (Data Format). +** + +BF16 seems to be an equally commonly used term for bfloat16, yes. +done, added. + +** +2. For fishmv, what happens if the value supplied in the FPR is not + representable in single format? +** + +I'm assuming you're asking what happens if something like `f3 = 0x0080_0000_0000_0001` and `fishmv f3, 0xABCD` is executed: +Exactly the same thing as if the FPR value isn't representable in f32 format for stfs -- the value stored is defined by the `SINGLE` pseudo-code function, no fp status bits are set. Likewise, the input f32 value for fishmv is determined by the `SINGLE` pseudo-code function, no fp status bits are set, fishmv then replaces the lower 16 bits of the f32 value with the immediate, then converts the resulting f32 back to f64 using `DOUBLE` and stores it in FRT. + +Ultimately, these are immediates, statically-compiled. if the developer +wants "invalid" data, statically-compiled into a binary, it is reasonable +to assume they have good reasons for doing so. + +** +3. The first clause of the verbal description of fishmv seems to assume + that the contents of the specified register were produced by fmvis. + Is there any other use of fishmv? If yes, the verbal description should + be generalized. If no, the wording should be explicit about this use. +** + +given that the bits are spread out in `DOUBLE()` format it seems unlikely. +if the bits were placed contiguously (sequentially) then it would indeed +be a different matter: temporary storage for constants to be transferred +directly (unmodified) to GPRs for example. but DOUBLE() formatting +makes that not possible unfortunately. + +however alternative uses by programmers cannot be ruled out. it may +be the case that despite the format being DOUBLE() there is in fact +an FPR->GPR transfer instruction that can at least get the 32-bits +of immediate back out as a contiguous undamaged block. thus adding +notes that may turn out to be restrictive is inadviseable. + +additional note: DOUBLE() has been noted to perform normalisation. +this would make alternative uses even more unlikely. + +** +4. The instruction names and mnemonics should be more consistent with the + architecture spec. In particular, the architecture spec tends to use + "Move" for instructions that transfer data between registers. Here are + two approaches. +** + +``` + a. Model the instructions on li (Load Immediate), an extended mnemonic for + addi. + fmvis --> Floating Load Immediate Single (flis) + fishmv --> Floating Load Immediate Single Lower (flisl) + Under this approach the new instructions would belong in their own + 3-level section, after Section 4.6.4 (Floating-Point Load and Store + Double Pair Instructions). + + b. Model the instructions on lxvkq (and the existing FP Load instructions) + fmvis --> Load Floating-Point Single Immediate (lfsi) + fishmv --> Load Floating-Point Single Immediate Lower (lfsil) + Under this approach the new instructions would belong in Section 4.6.2 + (Floating-Point Load Instructions), with the Load Floating-Point + Single instructions. + + I prefer (a), because I think it's confusing to treat these instructions, + which don't access storage, like instructions that do access storage. +``` + +the fact that they bypass D-Cache and correspondingly raise no flags or +exceptions is the connection to `ld`. despite that i like (a) as well +although for purely non-technical reasons (more "memorable") i (Luke) do love +the two mnemonics `flis fishmv` :) + +we picked "s" on the end of `fmvis` (`flis`) because it is "shifted" +(like `oris`), not "single". + +**Other:** + +** +1. The RFC should be based on the current version of the architecture, + which is V. 3.1B. I believe this has no effect on the substance of the + RFC. But it affects the identities of the instruction-list appendices, + which in V. 3.1B are E, F, G, and H. +** + +acknowledged. will edit. done v3.1B, done EFGH. + +** +2. Additional affected sections are 1.6.1.6 (additional line for DX-form), + 1.6.2 (additional use for d0,d1,d2), and Appendix D (Opcode Maps). +** + +ditto. done 1.6.2 (FRS) + +missed the addition to 1.6.1.6 (DX-Form). done + +** +3. Does the last line of the Summary apply to both instructions or just to + fishmv? I can see why you would want a prefixed version of fmvis, which + would supply the entire 32-bit FP single format value and avoid the need + for fishmv. Why would you want a prefixed version of fishmv? +** + +the more interesting initial question is, "why no `pflis`?" and +the answer to that is "because flis and fishmv do exactly the same +job in exactly the same amount of bits" (64). +`flis` fills in a BF16, `fishmv` extends to an FP32, +and `pflis` would fill in an FP32 in exactly the same amount +of space, making it a redundant encoding. this just leaves the +purpose of `pfishmv` to be to extend (fill) an FP32 out to an FP64. + +that said: the next phase of whether it is worthwhile is to count the +I/D-Cache usage. +the analysis counting instructions and D-Cache Loads actually shows +that whilst the initial idea for `pfishmv` would be to fill in the +remaining mantissa and high exponent bits to complete a full FP64, +the cost of doing so is: + +* 1x32 flis +* 1x32 fishmv +* 1x64 pfishmv + +which totals QTY 4of 32-bits (across I-Cache) which is actually *more* than just `lfd`, +which is only QTY 3of 32-bits (across both I-Cache and D-Cache). +the only technical reason therefore is +to avoid D-Cache entirely, just like the 5-instruction sequence +that writes a 64-bit GPR only from immediates +(li, oris, rldicl, li, oris) although that is justifiable +as a critical means of bootstrapping (constructing 64 bit addresses) + +** +4. The Motivation says "Even clearing an FPR to zero presently requires Load". + What about fsub FRT,FRA,FRA? +** + +That doesn't actually clear FRT to zero because `NaN - NaN` and +`Inf - Inf` both equal `NaN`, not zero. Also, with "round to -inf", +0 - 0 produces -0, not 0. Thus use of `fsub` is critically +dependent on the contents of registers and status flags, and +would require more instructions, where `flis` is not. + +** +5. "FRS" for both instructions should be changed to "FRT". ("FRS" normally + specifies a source register; see Section 1.6.2. I understand that for + fishmv the specified register is both source and target. But "TX,T" + provides precedent for using the "target form" of register specification + for such cases.) +6. The RTL for fmvis should use left arrow for assignment. +** + +RTL error corrected. ack on FRT. done. + +** +7. The architecture spec (VSX chapter) uses "BFP32" and "BFP64", and the + lower-case versions thereof, for the 32-bit and 64-bit binary FP formats. + The RFC's "FP32" and "FP64" (and lower case of same) should be made + consistent with this usage. +** + +acknowledged. done. + +** +8. More generally, the style of the verbal description for both instructions + should be made more consistent with the style used in the architecture + spec. +** + +yes Paul kindly gave advice on that. done. + +** +9. In the first clause of the verbal description of fishmv I think "inserted + into FRS" should be "inserted into the low-order half of the single- + format value corresponding to the contents of FRT". + A similar change should be made in the second sentence of the next + paragraph. +** + +ack. done. (actually, removed the duplicate sentence/phrase) + +** +10. The paragraph before the Programming Note in the fishmv description + says "This is strategically similar to how li combined with oris is used + to construct 32-bit Integers". li combined with oris works only if bit 16 + of the desired 32-bit integer is 0. (A better way to construct a 32-bit + integer is to use pli (extended mnemonic for paddi).) +** + +it is unlikely that we (Libre-SOC) will initially implement any of v3.1 +64-bit prefixing (it cannot be Vectorised, resulting unacceptably in +96-bit instructions which we decided is too much). that said, the LD +addressing immediate extended range is extremely useful +(along with the PC-relative modes and also other instructions +such as paddi). + +bottom line we have not yet given much thought to using any v3.1 Scalar +Prefixed instructions, at all, so don't even know most of what they do. + +that said: if `paddi` puts 32-bits into a GPR, and does so in 64 bits, +is it not similarly redundant i.e. exactly the same amount of space +used as two 32-bit instructions? if `paddi` puts *more* than 32 bits +into a GPR then it is not the same and would not make a suitable +comparative analogy as a Programmer's Note. + +# Questions (11 Oct) + +**Should the use of DOUBLE() be bypassed?** + +No, because we specifically want to be able to express all possible f32 values, +including denormal values. those denormal values require normalization to get +the corresponding f64 values. diff --git a/openpower/sv/rfc/ls002/discussion.mdwn b/openpower/sv/rfc/ls002/discussion.mdwn deleted file mode 100644 index 77705be5b..000000000 --- a/openpower/sv/rfc/ls002/discussion.mdwn +++ /dev/null @@ -1,230 +0,0 @@ -# Links - -* [[sv/int_fp_mv]] - -# v3.1 Prefixed instructions - -**PREFIXED INSTRUCTIONS ARE 100% OUT OF SCOPE OF THIS RFC**. - -please do not extend the scope of this RFC beyond the two -32-bit instructions. - -# Questions (09 oct 2022) - -**Substantive or semi-substantive:** - -** -1. What is "BF16"? It seems not to be mentioned in the architecture spec. - The architecture spec (VSX chapter) defines two 16-bit binary FP formats. - Judging by the way the RFC uses "BF16", I think it means what the VSX - chapter calls "bfloat16", which has the exponent in the same bits as - single format. This should be clarified, and the corresponding format - will need to be defined in Section 4.3.1 (Data Format). -** - -BF16 seems to be an equally commonly used term for bfloat16, yes. -done, added. - -** -2. For fishmv, what happens if the value supplied in the FPR is not - representable in single format? -** - -I'm assuming you're asking what happens if something like `f3 = 0x0080_0000_0000_0001` and `fishmv f3, 0xABCD` is executed: -Exactly the same thing as if the FPR value isn't representable in f32 format for stfs -- the value stored is defined by the `SINGLE` pseudo-code function, no fp status bits are set. Likewise, the input f32 value for fishmv is determined by the `SINGLE` pseudo-code function, no fp status bits are set, fishmv then replaces the lower 16 bits of the f32 value with the immediate, then converts the resulting f32 back to f64 using `DOUBLE` and stores it in FRT. - -Ultimately, these are immediates, statically-compiled. if the developer -wants "invalid" data, statically-compiled into a binary, it is reasonable -to assume they have good reasons for doing so. - -** -3. The first clause of the verbal description of fishmv seems to assume - that the contents of the specified register were produced by fmvis. - Is there any other use of fishmv? If yes, the verbal description should - be generalized. If no, the wording should be explicit about this use. -** - -given that the bits are spread out in `DOUBLE()` format it seems unlikely. -if the bits were placed contiguously (sequentially) then it would indeed -be a different matter: temporary storage for constants to be transferred -directly (unmodified) to GPRs for example. but DOUBLE() formatting -makes that not possible unfortunately. - -however alternative uses by programmers cannot be ruled out. it may -be the case that despite the format being DOUBLE() there is in fact -an FPR->GPR transfer instruction that can at least get the 32-bits -of immediate back out as a contiguous undamaged block. thus adding -notes that may turn out to be restrictive is inadviseable. - -additional note: DOUBLE() has been noted to perform normalisation. -this would make alternative uses even more unlikely. - -** -4. The instruction names and mnemonics should be more consistent with the - architecture spec. In particular, the architecture spec tends to use - "Move" for instructions that transfer data between registers. Here are - two approaches. -** - -``` - a. Model the instructions on li (Load Immediate), an extended mnemonic for - addi. - fmvis --> Floating Load Immediate Single (flis) - fishmv --> Floating Load Immediate Single Lower (flisl) - Under this approach the new instructions would belong in their own - 3-level section, after Section 4.6.4 (Floating-Point Load and Store - Double Pair Instructions). - - b. Model the instructions on lxvkq (and the existing FP Load instructions) - fmvis --> Load Floating-Point Single Immediate (lfsi) - fishmv --> Load Floating-Point Single Immediate Lower (lfsil) - Under this approach the new instructions would belong in Section 4.6.2 - (Floating-Point Load Instructions), with the Load Floating-Point - Single instructions. - - I prefer (a), because I think it's confusing to treat these instructions, - which don't access storage, like instructions that do access storage. -``` - -the fact that they bypass D-Cache and correspondingly raise no flags or -exceptions is the connection to `ld`. despite that i like (a) as well -although for purely non-technical reasons (more "memorable") i (Luke) do love -the two mnemonics `flis fishmv` :) - -we picked "s" on the end of `fmvis` (`flis`) because it is "shifted" -(like `oris`), not "single". - -**Other:** - -** -1. The RFC should be based on the current version of the architecture, - which is V. 3.1B. I believe this has no effect on the substance of the - RFC. But it affects the identities of the instruction-list appendices, - which in V. 3.1B are E, F, G, and H. -** - -acknowledged. will edit. done v3.1B, done EFGH. - -** -2. Additional affected sections are 1.6.1.6 (additional line for DX-form), - 1.6.2 (additional use for d0,d1,d2), and Appendix D (Opcode Maps). -** - -ditto. done 1.6.2 (FRS) - -missed the addition to 1.6.1.6 (DX-Form). done - -** -3. Does the last line of the Summary apply to both instructions or just to - fishmv? I can see why you would want a prefixed version of fmvis, which - would supply the entire 32-bit FP single format value and avoid the need - for fishmv. Why would you want a prefixed version of fishmv? -** - -the more interesting initial question is, "why no `pflis`?" and -the answer to that is "because flis and fishmv do exactly the same -job in exactly the same amount of bits" (64). -`flis` fills in a BF16, `fishmv` extends to an FP32, -and `pflis` would fill in an FP32 in exactly the same amount -of space, making it a redundant encoding. this just leaves the -purpose of `pfishmv` to be to extend (fill) an FP32 out to an FP64. - -that said: the next phase of whether it is worthwhile is to count the -I/D-Cache usage. -the analysis counting instructions and D-Cache Loads actually shows -that whilst the initial idea for `pfishmv` would be to fill in the -remaining mantissa and high exponent bits to complete a full FP64, -the cost of doing so is: - -* 1x32 flis -* 1x32 fishmv -* 1x64 pfishmv - -which totals QTY 4of 32-bits (across I-Cache) which is actually *more* than just `lfd`, -which is only QTY 3of 32-bits (across both I-Cache and D-Cache). -the only technical reason therefore is -to avoid D-Cache entirely, just like the 5-instruction sequence -that writes a 64-bit GPR only from immediates -(li, oris, rldicl, li, oris) although that is justifiable -as a critical means of bootstrapping (constructing 64 bit addresses) - -** -4. The Motivation says "Even clearing an FPR to zero presently requires Load". - What about fsub FRT,FRA,FRA? -** - -That doesn't actually clear FRT to zero because `NaN - NaN` and -`Inf - Inf` both equal `NaN`, not zero. Also, with "round to -inf", -0 - 0 produces -0, not 0. Thus use of `fsub` is critically -dependent on the contents of registers and status flags, and -would require more instructions, where `flis` is not. - -** -5. "FRS" for both instructions should be changed to "FRT". ("FRS" normally - specifies a source register; see Section 1.6.2. I understand that for - fishmv the specified register is both source and target. But "TX,T" - provides precedent for using the "target form" of register specification - for such cases.) -6. The RTL for fmvis should use left arrow for assignment. -** - -RTL error corrected. ack on FRT. done. - -** -7. The architecture spec (VSX chapter) uses "BFP32" and "BFP64", and the - lower-case versions thereof, for the 32-bit and 64-bit binary FP formats. - The RFC's "FP32" and "FP64" (and lower case of same) should be made - consistent with this usage. -** - -acknowledged. done. - -** -8. More generally, the style of the verbal description for both instructions - should be made more consistent with the style used in the architecture - spec. -** - -yes Paul kindly gave advice on that. done. - -** -9. In the first clause of the verbal description of fishmv I think "inserted - into FRS" should be "inserted into the low-order half of the single- - format value corresponding to the contents of FRT". - A similar change should be made in the second sentence of the next - paragraph. -** - -ack. done. (actually, removed the duplicate sentence/phrase) - -** -10. The paragraph before the Programming Note in the fishmv description - says "This is strategically similar to how li combined with oris is used - to construct 32-bit Integers". li combined with oris works only if bit 16 - of the desired 32-bit integer is 0. (A better way to construct a 32-bit - integer is to use pli (extended mnemonic for paddi).) -** - -it is unlikely that we (Libre-SOC) will initially implement any of v3.1 -64-bit prefixing (it cannot be Vectorised, resulting unacceptably in -96-bit instructions which we decided is too much). that said, the LD -addressing immediate extended range is extremely useful -(along with the PC-relative modes and also other instructions -such as paddi). - -bottom line we have not yet given much thought to using any v3.1 Scalar -Prefixed instructions, at all, so don't even know most of what they do. - -that said: if `paddi` puts 32-bits into a GPR, and does so in 64 bits, -is it not similarly redundant i.e. exactly the same amount of space -used as two 32-bit instructions? if `paddi` puts *more* than 32 bits -into a GPR then it is not the same and would not make a suitable -comparative analogy as a Programmer's Note. - -# Questions (11 Oct) - -**Should the use of DOUBLE() be bypassed?** - -No, because we specifically want to be able to express all possible f32 values, -including denormal values. those denormal values require normalization to get -the corresponding f64 values.