From 5b3e3929d3102937ee4bd00d40c86ebe1b037519 Mon Sep 17 00:00:00 2001 From: Jacob Lifshay Date: Wed, 15 Mar 2023 01:01:12 -0700 Subject: [PATCH] rewrite int_fp_mv_reduced_insn_count to account for reduced instructions changes --- .../sv/int_fp_mv_reduced_insn_count.mdwn | 429 +++++++++++++----- 1 file changed, 318 insertions(+), 111 deletions(-) diff --git a/openpower/sv/int_fp_mv_reduced_insn_count.mdwn b/openpower/sv/int_fp_mv_reduced_insn_count.mdwn index 8c4ccaa02..5e43fe612 100644 --- a/openpower/sv/int_fp_mv_reduced_insn_count.mdwn +++ b/openpower/sv/int_fp_mv_reduced_insn_count.mdwn @@ -284,51 +284,107 @@ fmvis f4, 0x3F80 # writes +1.0 to f4 fishmv f4, 0x8000 # writes +1.00390625 to f4 ``` +# Immediate Tables + +Tables that are used by `fmvtg`/`fmvfg`/`fcvttg`/`fcvtfg`: + +## `RCS` -- `Rc` and `s` + +| `RCS` | `Rc` | FP Single Mode | Assembly Alias Mnemonic | +|-------|------|----------------|-------------------------| +| 0 | 0 | Double | `` | +| 1 | 1 | Double | `.` | +| 2 | 0 | Single | `s` | +| 3 | 1 | Single | `s.` | + +## `IT` -- Integer Type + +| `IT` | Integer Type | Assembly Alias Mnemonic | +|------|-----------------|-------------------------| +| 0 | Signed 32-bit | `w` | +| 1 | Unsigned 32-bit | `uw` | +| 2 | Signed 64-bit | `d` | +| 3 | Unsigned 64-bit | `ud` | + +## `CVM` -- Float to Integer Conversion Mode + +| `CVM` | `rounding_mode` | Semantics | +|-------|-----------------|----------------------------------| +| 000 | from `FPSCR` | [OpenPower semantics] | +| 001 | Truncate | [OpenPower semantics] | +| 010 | from `FPSCR` | [Java semantics] | +| 011 | Truncate | [Java semantics] | +| 100 | from `FPSCR` | [JavaScript semantics] | +| 101 | Truncate | [JavaScript semantics] | +| rest | -- | illegal instruction trap for now | + +[OpenPower semantics]: #fp-to-int-openpower-conversion-semantics +[Java semantics]: #fp-to-int-java-conversion-semantics +[JavaScript semantics]: #fp-to-int-javascript-conversion-semantics + # Moves These instructions perform a straight unaltered bit-level copy from one Register File to another. -# FPR to GPR moves +## FPR to GPR move + +`fmvtg RT, FRB, RCS` -* `fmvtg RT, FRA` -* `fmvtg. RT, FRA` +| 0-5 | 6-10 | 11-15 | 16-20 | 21-29 | 30-31 | Form | +|-----|------|-------|-------|-------|-------|--------| +| PO | RT | 0 | FRB | XO | RCS | X-Form | -move a 64-bit float from a FPR to a GPR, just copying bits directly. -As a direct bitcopy, no exceptions occur and no status flags are set. +``` +if RCS[0] = 1 then # if Single mode + RT <- [0] * 32 || SINGLE((FRB)) # SINGLE since that's what stfs uses +else + RT <- (FRB) +``` + +move a 32/64-bit float from a FPR to a GPR, just copying bits of the IEEE 754 representation directly. This is equivalent to `stfs` followed by `lwz` or equivalent to `stfd` followed by `ld`. +As `fmvtg` is just copying bits, `FPSCR` is not affected in any way. Rc=1 tests RT and sets CR0, exactly like all other Scalar Fixed-Point operations. -* `fmvtgs RT, FRA` -* `fmvtgs. RT, FRA` +### Assembly Aliases -move a 32-bit float from a FPR to a GPR, just copying bits. Converts the -64-bit float in `FRA` to a 32-bit float, using the same method as `stfs`, -then writes the 32-bit float to `RT`, setting the high 32-bits to zeros. -Effectively, `fmvtgs` is a macro-fusion of `stfs` and `lwz` and therefore -does not behave like `frsp` and does not set any fp exception flags. +| Assembly Alias | Full Instruction | +|-------------------|--------------------| +| `fmvtg RT, FRB` | `fmvtg RT, FRB, 0` | +| `fmvtg. RT, FRB` | `fmvtg RT, FRB, 1` | +| `fmvtgs RT, FRB` | `fmvtg RT, FRB, 2` | +| `fmvtgs. RT, FRB` | `fmvtg RT, FRB, 3` | -Since RT is a GPR, Rc=1 follows standard *integer* behaviour, i.e. -tests RT and sets CR0. +## GPR to FPR move -# GPR to FPR moves +`fmvfg FRT, RB, RCS` -`fmvfg FRT, RA` +| 0-5 | 6-10 | 11-15 | 16-20 | 21-29 | 30-31 | Form | +|-----|------|-------|-------|-------|-------|--------| +| PO | FRT | 0 | RB | XO | RCS | X-Form | -move a 64-bit float from a GPR to a FPR, just copying bits. No exceptions -are raised, no flags are altered of any kind. +``` +if RCS[0] = 1 then # if Single mode + FRT <- DOUBLE((RB)[32:63]) # DOUBLE since that's what lfs uses +else + FRT <- (RB) +``` -Rc=1 tests FRT and sets CR1 +move a 32/64-bit float from a GPR to a FPR, just copying bits of the IEEE 754 representation directly. This is equivalent to `stw` followed by `lfs` or equivalent to `std` followed by `lfd`. As `fmvfg` is just copying bits, `FPSCR` is not affected in any way. -`fmvfgs FRT, RA` +Rc=1 tests FRT and sets CR1, exactly like all other Scalar Floating-Point +operations. -move a 32-bit float from a GPR to a FPR, just copying bits. Converts the -32-bit float in `RA` to a 64-bit float, using the same method as `lfs`, -then writes the 64-bit float to `FRT`. Effectively, `fmvfgs` is a -macro-fusion of `stw` and `lfs` and therefore no fp exception flags are set. +### Assembly Aliases -Rc=1 tests FRT and sets CR1, following usual fp Rc=1 semantics. +| Assembly Alias | Full Instruction | +|-------------------|--------------------| +| `fmvfg FRT, RB` | `fmvfg FRT, RB, 0` | +| `fmvfg. FRT, RB` | `fmvfg FRT, RB, 1` | +| `fmvfgs FRT, RB` | `fmvfg FRT, RB, 2` | +| `fmvfgs. FRT, RB` | `fmvfg FRT, RB, 3` | # Conversions @@ -337,58 +393,83 @@ these instructions perform conversions between Integer and Floating Point. Truncation can therefore occur, as well as exceptions. -Mode values: +## Floating-point Convert From GPR -| Mode | `rounding_mode` | Semantics | -|------|-----------------|----------------------------------| -| 000 | from `FPSCR` | [OpenPower semantics] | -| 001 | Truncate | [OpenPower semantics] | -| 010 | from `FPSCR` | [Java semantics] | -| 011 | Truncate | [Java semantics] | -| 100 | from `FPSCR` | [JavaScript semantics] | -| 101 | Truncate | [JavaScript semantics] | -| rest | -- | illegal instruction trap for now | +| 0-5 | 6-10 | 11-12 | 13-15 | 16-20 | 21-29 | 30-31 | Form | +|-----|------|-------|-------|-------|-------|-------|--------| +| PO | FRT | IT | 0 | RB | XO | RCS | X-Form | -[OpenPower semantics]: #fp-to-int-openpower-conversion-semantics -[Java semantics]: #fp-to-int-java-conversion-semantics -[JavaScript semantics]: #fp-to-int-javascript-conversion-semantics +`fcvtfg FRT, RB, IT, RCS` + +``` +if IT[0] = 0 and RCS[0] = 0 then # 32-bit int -> 64-bit float + # rounding never necessary, so don't touch FPSCR + # based off xvcvsxwdp + if IT = 0 then # Signed 32-bit + src <- bfp_CONVERT_FROM_SI32((RB)[32:63]) + else # IT = 1 -- Unsigned 32-bit + src <- bfp_CONVERT_FROM_UI32((RB)[32:63]) + FRT <- bfp64_CONVERT_FROM_BFP(src) +else + # rounding may be necessary + # based off xscvuxdsp + reset_xflags() + switch(IT) + case(0): # Signed 32-bit + src <- bfp_CONVERT_FROM_SI32((RB)[32:63]) + case(1): # Unsigned 32-bit + src <- bfp_CONVERT_FROM_UI32((RB)[32:63]) + case(2): # Signed 64-bit + src <- bfp_CONVERT_FROM_SI64((RB)) + default: # Unsigned 64-bit + src <- bfp_CONVERT_FROM_UI64((RB)) + if RCS[0] = 1 then # Single + rnd <- bfp_ROUND_TO_BFP32(FPSCR.RN, src) + result32 <- bfp32_CONVERT_FROM_BFP(rnd) + cls <- fprf_CLASS_BFP32(result32) + result <- DOUBLE(result32) + else + rnd <- bfp_ROUND_TO_BFP64(FPSCR.RN, src) + result <- bfp64_CONVERT_FROM_BFP(rnd) + cls <- fprf_CLASS_BFP64(result) + + if xx_flag = 1 then SetFX(FPSCR.XX) + + FRT <- result + FPSCR.FPRF <- cls + FPSCR.FR <- inc_flag + FPSCR.FI <- xx_flag +``` + +Convert from a unsigned/signed 32/64-bit integer in RB to a 32/64-bit float in FRT, following the usual 32-bit float in 64-bit float format. -## GPR to FPR conversions - -**Format** - -| 0-5 | 6-10 | 11-15 | 16-25 | 26-30 | 31 | Form | -|--------|------|--------|-------|-------|----|------| -| Major | FRT | //Mode | RA | XO | Rc |X-Form| - -All of the following GPR to FPR conversions use the rounding mode from `FPSCR`. - -* `fcvtfgw FRT, RA` - Convert from 32-bit signed integer in the GPR `RA` to 64-bit float in - `FRT`. -* `fcvtfgws FRT, RA` - Convert from 32-bit signed integer in the GPR `RA` to 32-bit float in - `FRT`. -* `fcvtfguw FRT, RA` - Convert from 32-bit unsigned integer in the GPR `RA` to 64-bit float in - `FRT`. -* `fcvtfguws FRT, RA` - Convert from 32-bit unsigned integer in the GPR `RA` to 32-bit float in - `FRT`. -* `fcvtfgd FRT, RA` - Convert from 64-bit signed integer in the GPR `RA` to 64-bit float in - `FRT`. -* `fcvtfgds FRT, RA` - Convert from 64-bit signed integer in the GPR `RA` to 32-bit float in - `FRT`. -* `fcvtfgud FRT, RA` - Convert from 64-bit unsigned integer in the GPR `RA` to 64-bit float in - `FRT`. -* `fcvtfguds FRT, RA` - Convert from 64-bit unsigned integer in the GPR `RA` to 32-bit float in - `FRT`. - -## FPR to GPR (Integer) conversions +If converting from a unsigned/signed 32-bit integer to a 64-bit float, rounding is never necessary, so `FPSCR` is unmodified and exceptions are never raised. Otherwise, `FPSCR` is modified and exceptions are raised as usual. + +Rc=1 tests FRT and sets CR1, exactly like all other Scalar Floating-Point +operations. + +### Assembly Aliases + +| Assembly Alias | Full Instruction | +|----------------------|------------------------| +| `fcvtfgw FRT, RB` | `fcvtfg FRT, RB, 0, 0` | +| `fcvtfgw. FRT, RB` | `fcvtfg FRT, RB, 0, 1` | +| `fcvtfgws FRT, RB` | `fcvtfg FRT, RB, 0, 2` | +| `fcvtfgws. FRT, RB` | `fcvtfg FRT, RB, 0, 3` | +| `fcvtfguw FRT, RB` | `fcvtfg FRT, RB, 1, 0` | +| `fcvtfguw. FRT, RB` | `fcvtfg FRT, RB, 1, 1` | +| `fcvtfguws FRT, RB` | `fcvtfg FRT, RB, 1, 2` | +| `fcvtfguws. FRT, RB` | `fcvtfg FRT, RB, 1, 3` | +| `fcvtfgd FRT, RB` | `fcvtfg FRT, RB, 2, 0` | +| `fcvtfgd. FRT, RB` | `fcvtfg FRT, RB, 2, 1` | +| `fcvtfgds FRT, RB` | `fcvtfg FRT, RB, 2, 2` | +| `fcvtfgds. FRT, RB` | `fcvtfg FRT, RB, 2, 3` | +| `fcvtfgud FRT, RB` | `fcvtfg FRT, RB, 3, 0` | +| `fcvtfgud. FRT, RB` | `fcvtfg FRT, RB, 3, 1` | +| `fcvtfguds FRT, RB` | `fcvtfg FRT, RB, 3, 2` | +| `fcvtfguds. FRT, RB` | `fcvtfg FRT, RB, 3, 3` | + +## Floating-point to Integer Conversion Overview
@@ -443,15 +524,9 @@ For the sake of simplicity, the FP -> Integer conversion semantics generalized f This instruction is present in ARM assembler as FJCVTZS -**Format** - -| 0-5 | 6-10 | 11-15 | 16-25 | 26-30 | 31 | Form | -|--------|------|--------|-------|-------|----|------| -| Major | RT | //Mode | FRA | XO | Rc |X-Form| - **Rc=1 and OE=1** -All of these insructions have an Rc=1 mode which sets CR0 +All of these instructions have an Rc=1 mode which sets CR0 in the normal way for any instructions producing a GPR result. Additionally, when OE=1, if the numerical value of the FP number is not 100% accurately preserved (due to truncation or saturation @@ -459,36 +534,7 @@ and including when the FP number was NaN) then this is considered to be an integer Overflow condition, and CR0.SO, XER.SO and XER.OV are all set as normal for any GPR instructions that overflow. -**Instructions** - -* `fcvttgw RT, FRA, Mode` - Convert from 64-bit float to 32-bit signed integer, writing the result - to the GPR `RT`. Converts using [mode `Mode`]. Similar to `fctiw` or `fctiwz` -* `fcvttguw RT, FRA, Mode` - Convert from 64-bit float to 32-bit unsigned integer, writing the result - to the GPR `RT`. Converts using [mode `Mode`]. Similar to `fctiwu` or `fctiwuz` -* `fcvttgd RT, FRA, Mode` - Convert from 64-bit float to 64-bit signed integer, writing the result - to the GPR `RT`. Converts using [mode `Mode`]. Similar to `fctid` or `fctidz` -* `fcvttgud RT, FRA, Mode` - Convert from 64-bit float to 64-bit unsigned integer, writing the result - to the GPR `RT`. Converts using [mode `Mode`]. Similar to `fctidu` or `fctiduz` -* `fcvtstgw RT, FRA, Mode` - Convert from 32-bit float to 32-bit signed integer, writing the result - to the GPR `RT`. Converts using [mode `Mode`] -* `fcvtstguw RT, FRA, Mode` - Convert from 32-bit float to 32-bit unsigned integer, writing the result - to the GPR `RT`. Converts using [mode `Mode`] -* `fcvtstgd RT, FRA, Mode` - Convert from 32-bit float to 64-bit signed integer, writing the result - to the GPR `RT`. Converts using [mode `Mode`] -* `fcvtstgud RT, FRA, Mode` - Convert from 32-bit float to 64-bit unsigned integer, writing the result - to the GPR `RT`. Converts using [mode `Mode`] - -[mode `Mode`]: #fpr-to-gpr-conversion-mode - -## FP to Integer Conversion Pseudo-code +### FP to Integer Conversion Simplified Pseudo-code Key for pseudo-code: @@ -550,3 +596,164 @@ def fp_to_int_java_script(v: fp) -> int: return (int)bits ``` +## Floating-point Convert To GPR + +| 0-5 | 6-10 | 11-12 | 13-15 | 16-20 | 21-28 | 29 | 30 | 31 | Form | +|-----|------|-------|-------|-------|-------|--------|----|--------|---------| +| PO | RT | IT | CVM | FRB | XO | RCS[0] | OE | RCS[1] | XO-Form | + +`fcvttg RT, FRB, CVM, IT, RCS` +`fcvttgo RT, FRB, CVM, IT, RCS` + +``` +# based on xscvdpuxws +reset_xflags() + +if RCS[0] = 1 then # if Single mode + src <- bfp_CONVERT_FROM_BFP32(SINGLE((FRB))) +else + src <- bfp_CONVERT_FROM_BFP64((FRB)) + +switch(IT) + case(0): # Signed 32-bit + range_min <- bfp_CONVERT_FROM_SI32(0x8000_0000) + range_max <- bfp_CONVERT_FROM_SI32(0x7FFF_FFFF) + js_mask <- 0xFFFF_FFFF + case(1): # Unsigned 32-bit + range_min <- bfp_CONVERT_FROM_UI32(0) + range_max <- bfp_CONVERT_FROM_UI32(0xFFFF_FFFF) + js_mask <- 0xFFFF_FFFF + case(2): # Signed 64-bit + range_min <- bfp_CONVERT_FROM_SI64(-0x8000_0000_0000_0000) + range_max <- bfp_CONVERT_FROM_SI64(0x7FFF_FFFF_FFFF_FFFF) + js_mask <- 0xFFFF_FFFF_FFFF_FFFF + default: # Unsigned 64-bit + range_min <- bfp_CONVERT_FROM_UI64(0) + range_max <- bfp_CONVERT_FROM_UI64(0xFFFF_FFFF_FFFF_FFFF) + js_mask <- 0xFFFF_FFFF_FFFF_FFFF + +if CVM[2] = 1 or FPSCR.RN = 0b01 then + rnd <- bfp_ROUND_TO_INTEGER_TRUNC(src) +else if FPSCR.RN = 0b00 then + rnd <- bfp_ROUND_TO_INTEGER_NEAR_EVEN(src) +else if FPSCR.RN = 0b10 then + rnd <- bfp_ROUND_TO_INTEGER_CEIL(src) +else if FPSCR.RN = 0b11 then + rnd <- bfp_ROUND_TO_INTEGER_FLOOR(src) + +# set conversion flags +switch(IT) + case(0): # Signed 32-bit + si32_CONVERT_FROM_BFP(rnd) + case(1): # Unsigned 32-bit + ui32_CONVERT_FROM_BFP(rnd) + case(2): # Signed 64-bit + si64_CONVERT_FROM_BFP(rnd) + default: # Unsigned 64-bit + ui64_CONVERT_FROM_BFP(rnd) + +switch(CVM) + case(0, 1): # OpenPower semantics + if IsNaN(rnd) then + result <- si64_CONVERT_FROM_BFP(range_min) + else if bfp_COMPARE_GT(rnd, range_max) then + result <- ui64_CONVERT_FROM_BFP(range_max) + else if bfp_COMPARE_LT(rnd, range_min) then + result <- si64_CONVERT_FROM_BFP(range_min) + else if IT[1] = 1 then # Unsigned 32/64-bit + result <- ui64_CONVERT_FROM_BFP(range_max) + else # Signed 32/64-bit + result <- si64_CONVERT_FROM_BFP(range_max) + case(2, 3): # Java semantics + if IsNaN(rnd) then + result <- [0] * 64 + else if bfp_COMPARE_GT(rnd, range_max) then + result <- ui64_CONVERT_FROM_BFP(range_max) + else if bfp_COMPARE_LT(rnd, range_min) then + result <- si64_CONVERT_FROM_BFP(range_min) + else if IT[1] = 1 then # Unsigned 32/64-bit + result <- ui64_CONVERT_FROM_BFP(range_max) + else # Signed 32/64-bit + result <- si64_CONVERT_FROM_BFP(range_max) + default: # JavaScript semantics + # CVM = 6, 7 are illegal instructions + + if IsInf(rnd) or IsNaN(rnd) then + result <- [0] * 64 + else + # this works because the largest type we try to + # convert from has 53 significand bits, and the + # largest type we try to convert to has 64 bits, + # and the sum of those is strictly less than the + # 128 bits of the intermediate result. + result128 <- si128_CONVERT_FROM_BFP(rnd) + result <- result128[64:127] & js_mask + +switch(IT) + case(0): # Signed 32-bit + result <- EXTS64(result[32:63]) + result_bfp <- bfp_CONVERT_FROM_SI32(result[32:63]) + case(1): # Unsigned 32-bit + result <- EXTZ64(result[32:63]) + result_bfp <- bfp_CONVERT_FROM_UI32(result[32:63]) + case(2): # Signed 64-bit + result_bfp <- bfp_CONVERT_FROM_SI64(result) + default: # Unsigned 64-bit + result_bfp <- bfp_CONVERT_FROM_UI64(result) + +if vxsnan_flag = 1 then SetFX(FPSCR.VXSNAN) +if vxcvi_flag = 1 then SetFX(FPSCR.VXCVI) +if xx_flag = 1 then SetFX(FPSCR.XX) + +vx_flag <- vxsnan_flag | vxcvi_flag +vex_flag <- FPSCR.VE & vx_flag + +if vex_flag = 0 then + RT <- result + FPSCR.FPRF <- undefined + FPSCR.FR <- inc_flag + FPSCR.FI <- xx_flag + if IsNaN(src) or not bfp_COMPARE_EQ(src, result_bfp) then + overflow <- 1 # signals SO only when OE = 1 +else + FPSCR.FR <- 0 + FPSCR.FI <- 0 +``` + +Convert from 32/64-bit float in FRB to a unsigned/signed 32/64-bit integer in RT, with the conversion overflow/rounding semantics following the chosen `CVM` value, following the usual 32-bit float in 64-bit float format. + +`FPSCR` is modified and exceptions are raised as usual. + +Both of these instructions have an Rc=1 mode which sets CR0 +in the normal way for any instructions producing a GPR result. +Additionally, when OE=1, if the numerical value of the FP number +is not 100% accurately preserved (due to truncation or saturation +and including when the FP number was NaN) then this is considered +to be an integer Overflow condition, and CR0.SO, XER.SO and XER.OV +are all set as normal for any GPR instructions that overflow. + +### Assembly Aliases + +For brevity, `[o]` is used to mean `o` is optional there. + +| Assembly Alias | Full Instruction | +|------------------------------|--------------------------------| +| `fcvttgw[o] RT, FRB, CVM` | `fcvttg[o] RT, FRB, CVM, 0, 0` | +| `fcvttgw[o]. RT, FRB, CVM` | `fcvttg[o] RT, FRB, CVM, 0, 1` | +| `fcvtstgw[o] RT, FRB, CVM` | `fcvttg[o] RT, FRB, CVM, 0, 2` | +| `fcvtstgw[o]. RT, FRB, CVM` | `fcvttg[o] RT, FRB, CVM, 0, 3` | +| `fcvttguw[o] RT, FRB, CVM` | `fcvttg[o] RT, FRB, CVM, 1, 0` | +| `fcvttguw[o]. RT, FRB, CVM` | `fcvttg[o] RT, FRB, CVM, 1, 1` | +| `fcvtstguw[o] RT, FRB, CVM` | `fcvttg[o] RT, FRB, CVM, 1, 2` | +| `fcvtstguw[o]. RT, FRB, CVM` | `fcvttg[o] RT, FRB, CVM, 1, 3` | +| `fcvttgd[o] RT, FRB, CVM` | `fcvttg[o] RT, FRB, CVM, 2, 0` | +| `fcvttgd[o]. RT, FRB, CVM` | `fcvttg[o] RT, FRB, CVM, 2, 1` | +| `fcvtstgd[o] RT, FRB, CVM` | `fcvttg[o] RT, FRB, CVM, 2, 2` | +| `fcvtstgd[o]. RT, FRB, CVM` | `fcvttg[o] RT, FRB, CVM, 2, 3` | +| `fcvttgud[o] RT, FRB, CVM` | `fcvttg[o] RT, FRB, CVM, 3, 0` | +| `fcvttgud[o]. RT, FRB, CVM` | `fcvttg[o] RT, FRB, CVM, 3, 1` | +| `fcvtstgud[o] RT, FRB, CVM` | `fcvttg[o] RT, FRB, CVM, 3, 2` | +| `fcvtstgud[o]. RT, FRB, CVM` | `fcvttg[o] RT, FRB, CVM, 3, 3` | + +[mode `Mode`]: #fpr-to-gpr-conversion-mode + -- 2.30.2