openpower/sv/rfc/ls003.mdwn

   1 # RFC ls003 Big Integer
   2
   3 **URLs**:
   4
   5 * <https://libre-soc.org/openpower/sv/biginteger/analysis/>
   6 * <https://libre-soc.org/openpower/sv/rfc/ls003/>
   7 * <https://bugs.libre-soc.org/show_bug.cgi?id=960>
   8 * <https://git.openpower.foundation/isa/PowerISA/issues/91>
   9
  10 **Severity**: Major
  11
  12 **Status**: New
  13
  14 **Date**: 20 Oct 2022
  15
  16 **Target**: v3.2B
  17
  18 **Source**: v3.0B
  19
  20 **Books and Section affected**: **UPDATE**
  21
  22 ```
  23     Book I 64-bit Fixed-Point Arithmetic Instructions 3.3.9.1
  24     Appendix E Power ISA sorted by opcode
  25     Appendix F Power ISA sorted by version
  26     Appendix G Power ISA sorted by Compliancy Subset
  27     Appendix H Power ISA sorted by mnemonic
  28 ```
  29
  30 **Summary**
  31
  32 ```
  33     Instructions added
  34     maddedu - Multiply-Add Extended Double Unsigned
  35     divmod2du - Divide/Modulo Quad-Double Unsigned
  36 ```
  37
  38 **Submitter**: Luke Leighton (Libre-SOC)
  39
  40 **Requester**: Libre-SOC
  41
  42 **Impact on processor**:
  43
  44 ```
  45     Addition of two new GPR-based instructions
  46 ```
  47
  48 **Impact on software**:
  49
  50 ```
  51     Requires support for new instructions in assembler, debuggers,
  52     and related tools.
  53 ```
  54
  55 **Keywords**:
  56
  57 ```
  58     GPR, Big-integer, Double-word
  59 ```
  60
  61 **Motivation**
  62
  63 Similar to `maddhdu` and `maddld`, but allow for a big-integer rolling
  64 accumulation affect: `RC` effectively becomes a 64-bit carry in chains
  65 of highly-efficient loop-unrolled arbitrary-length big-integer operations.
  66 Similar to `divdeu`, and has similar advantages to `maddedu`,
  67 Modulo result is available with the quotient in a single instruction
  68 allowing highly-efficient arbitrary-length big-integer division.
  69
  70 **Notes and Observations**:
  71
  72 1. There is no need for an Rc=1 variant as VA-Form is being used.
  73 2. There is no need for Special Registers as VA-Form is being used.
  74 3. Both instructions have been present in Intel x86 for several decades.
  75 4. Neither instruction is present in VSX: these are 128/64 whereas
  76    VSX is 128/128.
  77 5. `maddedu` and `divmod2du` are full inverses of each other, including
  78   when used for arbitrary-length big-integer arithmetic
  79 6. These are both 3-in 2-out instructions. If Power ISA did not already
  80   have LD/ST-with-update instructions and instructions with `RAp`
  81   and `RTp` then these instructions would not be proposed.
  82
  83 **Changes**
  84
  85 Add the following entries to:
  86
  87 * the Appendices of Book I
  88 * Instructions of Book I added to Section 3.3.9.1
  89
  90 ----------------
  91
  92 \newpage{}
  93
  94 # Multiply-Add Extended Double Unsigned
  95
  96 `maddedu RT, RA, RB, RC`
  97
  98 |  0-5  | 6-10 | 11-15 | 16-20 | 21-25 | 26-31 | Form    |
  99 |-------|------|-------|-------|-------|-------|---------|
 100 | EXT04 | RT   |  RA   |  RB   |   RC  |  XO   | VA-Form |
 101
 102 Pseudocode:
 103
 104 ```
 105 prod[0:127] <- (RA) * (RB)    # Multiply RA and RB, result 128-bit
 106 sum[0:127] <- EXTZ(RC) + prod # Zero extend RC, add product
 107 RT <- sum[64:127]             # Store low half in RT
 108 RS <- sum[0:63]               # RS implicit register, equal to RC
 109 ```
 110
 111 Special registers altered:
 112
 113     None
 114
 115 RC is zero-extended (not shifted, not sign-extended), the 128-bit product added
 116 to it; the lower half of that result stored in RT and the upper half
 117 in RS.
 118
 119 The differences here to `maddhdu` are that `maddhdu` stores the upper
 120 half in RT, where `maddedu` stores the upper half in RS.
 121
 122 The value stored in RT is exactly equivalent to `maddld` despite `maddld`
 123 performing sign-extension on RC, because RT is the full mathematical result
 124 modulo 2^64 and sign/zero extension from 64 to 128 bits produces identical
 125 results modulo 2^64. This is why there is no maddldu instruction.
 126
 127 RS is implictly defined as the same register as RC.
 128
 129 *Programmer's Note:
 130 As a Scalar Power ISA operation, like `lq` and `stq`, RS=RT+1.
 131 To achieve a big-integer rolling-accumulation effect:
 132 assuming the scalar to multiply is in r0,
 133 the vector to multiply by starts at r4 and the result vector
 134 in r20, instructions may be issued `maddedu r20,r4,r0,r20
 135 maddedu r21,r5,r0,r21` etc. where the first `maddedu` will have
 136 stored the upper half of the 128-bit multiply into r21, such
 137 that it may be picked up by the second `maddedu`. Repeat inline
 138 to construct a larger bigint scalar-vector multiply,
 139 as Scalar GPR register file space permits.*
 140
 141 Examples:
 142
 143 ```
 144 # (r0 * r1) + r2, store lower in r4, upper in r2
 145 maddedu r4, r0, r1, r2
 146 ```
 147
 148 # Divide/Modulo Quad-Double Unsigned
 149
 150 **Should name be Divide/Module Double Extended Unsigned?**
 151 **Check the pseudo-code comments**
 152
 153 `divmod2du RT,RA,RB,RC`
 154
 155 |  0-5  | 6-10 | 11-15 | 16-20 | 21-25 | 26-31 | Form    |
 156 |-------|------|-------|-------|-------|-------|---------|
 157 | EXT04 | RT   |  RA   |  RB   |   RC  |  XO   | VA-Form |
 158
 159 Pseudo-code:
 160
 161 ```
 162 if ((RA) <u (RB)) & ((RB) != [0]*XLEN) then   # Check RA<RB, for divide-by-0
 163     dividend[0:(XLEN*2)-1] <- (RA) || (RC)    # Combine RA/RC, zero extend
 164     divisor[0:(XLEN*2)-1] <- [0]*XLEN || (RB) # Extend to 128-bit
 165     result <- dividend / divisor                  # Division
 166     modulo <- dividend % divisor                  # Modulo
 167     RT <- result[XLEN:(XLEN*2)-1]                 # Store result in RT
 168     RS <- modulo[XLEN:(XLEN*2)-1]                 # Modulo in RC, implicit
 169 else                                      # In case of error
 170     RT <- [1]*XLEN                                # RT all 1's
 171     RS <- [0]*XLEN                                # RS all 0's
 172 ```
 173
 174 Special registers altered:
 175
 176     None
 177
 178 Divide/Modulo Quad-Double Unsigned is another VA-Form instruction
 179 that is near-identical to `divdeu` except that:
 180
 181 * the lower 64 bits of the dividend, instead of being zero, contain a
 182   register, RC.
 183 * it performs a fused divide and modulo in a single instruction, storing
 184   the modulo in an implicit RS (similar to `maddedu`)
 185
 186 RB, the divisor, remains 64 bit.  The instruction is therefore a 128/64
 187 division, producing a (pair) of 64 bit result(s), in the same way that
 188 Intel [divq](https://www.felixcloutier.com/x86/div) works.
 189 Overflow conditions
 190 are detected in exactly the same fashion as `divdeu`, except that rather
 191 than have `UNDEFINED` behaviour, RT is set to all ones and RS set to all
 192 zeros on overflow.
 193
 194 *Programmer's note: there are no Rc variants of any of these VA-Form
 195 instructions. `cmpi` will need to be used to detect overflow conditions:
 196 the saving in instruction count is that both RT and RS will have already
 197 been set to useful values (all 1s and all zeros respectively)
 198 needed as part of implementing Knuth's Algorithm D*
 199
 200 For Scalar usage, just as for `maddedu`, `RS=RC`
 201 Examples:
 202
 203 ```
 204 # ((r0 << 64) + r2) / r1, store in r4
 205 # ((r0 << 64) + r2) % r1, store in r2
 206 divmod2du r4, r0, r1, r2
 207 ```
 208
 209 [[!tag opf_rfc]]
 210
 211 # Appendices
 212
 213     Appendix E Power ISA sorted by opcode
 214     Appendix F Power ISA sorted by version
 215     Appendix G Power ISA sorted by Compliancy Subset
 216     Appendix H Power ISA sorted by mnemonic
 217
 218 | Form | Book | Page | Version | mnemonic | Description |
 219 |------|------|------|---------|----------|-------------|
 220 | VA   | I    | #    | 3.0B    | maddedu  | Multiply-Add Extend Double Unsigned |
 221 | VA   | I    | #    | 3.0B    | divmod2du | Floatif Move | Divide/Modulo Quad-Double Unsigned
 222