From 1b14f6038a6a6524668ba4a4695f35b1eadac9ff Mon Sep 17 00:00:00 2001 From: Jacob Lifshay Date: Thu, 2 Mar 2023 21:24:22 -0800 Subject: [PATCH] add maddedus to ls003 --- openpower/sv/rfc/ls003.mdwn | 66 +++++++++++++++++++++++++++++++++++++ 1 file changed, 66 insertions(+) diff --git a/openpower/sv/rfc/ls003.mdwn b/openpower/sv/rfc/ls003.mdwn index 05c5aec60..86ccbaac3 100644 --- a/openpower/sv/rfc/ls003.mdwn +++ b/openpower/sv/rfc/ls003.mdwn @@ -33,6 +33,7 @@ Instructions added ``` maddedu - Multiply-Add Extended Double Unsigned + maddedus - Multiply-Add Extended Double Unsigned/Signed divmod2du - Divide/Modulo Quad-Double Unsigned dsld - Double Shift Left Doubleword dsrd - Double Shift Right Doubleword @@ -175,6 +176,70 @@ maddedu r22,r6,r0,r3 \newpage{} +# Multiply-Add Extended Double Unsigned/Signed + +`maddedus RT, RA, RB, RC` + +| 0-5 | 6-10 | 11-15 | 16-20 | 21-25 | 26-31 | Form | +|-------|------|-------|-------|-------|-------|---------| +| EXT04 | RT | RA | RB | RC | XO | VA-Form | + +Pseudocode: + +``` +if (RB)[0] != 0 then # workaround no unsigned-signed mul op + prod[0:127] <- -((RA) * -(RB)) +else + prod[0:127] <- (RA) * (RB) +sum[0:127] <- prod + EXTS128((RC)) +RT <- sum[64:127] # Store low half in RT +RS <- sum[0:63] # RS implicit register, equal to RC +``` + +Special registers altered: + + None + +The 64-bit operands are (RA), (RB), and (RC). +(RC) is sign-extended to 128-bits and then summed with the +128-bit product of zero-extended (RA) and sign-extended (RB). +The low-order 64 bits of the 128-bit sum are +placed into register RT. +The high-order 64 bits of the 128-bit sum are +placed into register RS. +RS is implicitly defined as the same register as RC. + +*Programmer's Note: +To achieve a big-integer rolling-accumulation effect: +assuming the signed scalar to multiply is in r0, and r3 is +used (effectively) as a 64-bit carry, +the unsigned vector to multiply by starts at r4 and the signed result vector +in r20, instructions may be issued `maddedus r20,r4,r0,r3` +`maddedus r21,r5,r0,r3` etc. where the first `maddedus` will have +stored the upper half of the 128-bit multiply into r3, such +that it may be picked up by the second `maddedus`. Repeat inline +to construct a larger bigint scalar-vector multiply, +as Scalar GPR register file space permits. If register +spill is required then r3, as the effective 64-bit carry, +continues the chain.* + +Examples: + +``` +# (r0 * r1) + r2, store lower in r4, upper in r2 +maddedus r4, r0, r1, r2 + +# Chaining together for larger bigint (see Programmer's Note above) +# r3 starts with zero (no carry-in) +maddedus r20,r4,r0,r3 +maddedus r21,r5,r0,r3 +maddedus r22,r6,r0,r3 +``` + +---------- + +\newpage{} + # Divide/Modulo Quad-Double Unsigned **Should name be Divide/Module Double Extended Unsigned?** @@ -389,6 +454,7 @@ XO (26:30) |Form| Book | Page | Version | mnemonic | Description | |----|------|------|---------|----------|-------------| |VA | I | # | 3.2B |maddedu | Multiply-Add Extend Double Unsigned | +|VA | I | # | 3.2B |maddedus | Multiply-Add Extend Double Unsigned Signed | |VA | I | # | 3.2B |divmod2du | Divide/Modulo Quad-Double Unsigned | |VA2 | I | # | 3.2B |dsld | Double-Shift Left Doubleword | |VA2 | I | # | 3.2B |dsrd | Double-Shift Right Doubleword | -- 2.30.2