openpower/sv/biginteger.mdwn

   1 [[!tag standards]]
   2
   3 # Big Integer Arithmetic
   4
   5 **DRAFT STATUS** 19apr2021
   6
   7 (see [[discussion]] page for notes)
   8
   9 BigNum arithmetic is extremely common especially in cryptography,
  10 where for example RSA relies on arithmetic of 2048 or 4096 bits
  11 in length.  The primary operations are add, multiply and divide
  12 (and modulo) with specialisations of subtract and signed multiply.
  13
  14 A reminder that a particular focus of SVP64 is that it is built on
  15 top of Scalar operations, where those scalar operations are useful in
  16 their own right without SVP64. Thus the operations here are proposed
  17 first as Scalar Extensions to the Power ISA.
  18
  19 A secondary focus is that if Vectorised, implementors may choose
  20 to deploy macro-op fusion targetting back-end 256-bit or greater
  21 Dynamic SIMD ALUs for maximum performance and effectiveness.
  22
  23 # Analysis
  24
  25 Covered in [[biginteger/analysis]] the summary is that standard `adde`
  26 is sufficient for SVP64 Vectorisation of big-integer addition (and subfe
  27 for subtraction) but that big-integer multiply and divide require an
  28 extra 3-in 2-out instruction, similar to Intel's `mulx`, to be efficient.
  29 The same instruction (`madded`) is used for both because 'madded''s primary
  30 purpose is to perform a fused 64-bit scalar multiply with a large vector,
  31 where that result is Big-Added for Big-Multiply, but Big-Subtracted for
  32 Big-Divide.
  33
  34 Macro-op Fusion and back-end massively-wide SIMD ALUs may be deployed in a
  35 fashion that is hidden from the user, behind a consistent, stable ISA API.
  36
  37 # madded
  38
  39 **DRAFT**
  40
  41 `madded` is VA-Form:
  42
  43 |0.....5|6..10|11..15|16..20|21..25|26..31|
  44 |-------|-----|------|------|------|------|
  45 | EXT04 | RT  |  RA  |  RB  |   RC |  XO  |
  46
  47 The pseudocode for `madded RT, RA, RB, RC` is:
  48
  49     prod[0:127] = (RA) * (RB)
  50     sum[0:127] = EXTZ(RC) + prod
  51     RT <- sum[64:127]
  52     RS <- sum[0:63] # RS is either RC or RT+VL
  53
  54 RC is zero-extended (not shifted), the 128-bit product added
  55 to it; the lower half of that result stored in RT and the upper half
  56 in RS.
  57
  58 The differences here to `maddhdu` are that `maddhdu` stores the upper
  59 half in RT, where `madded` stores the upper half in RS. There is no
  60 equivalent to `maddld` because `maddld` performs sign-extension on RC.
  61
  62 As a Scalar Power ISA operation, like `lq` and `stq` RS=RT+1.
  63 SVP64 overrides this behaviour.
  64 For SVP64 EXTRA register extension, the `RM-1P-3S-1D` format is
  65 used with the additional bit set for determining RS.
  66
  67 | Field Name | Field bits | Description                            |
  68 |------------|------------|----------------------------------------|
  69 | Rdest\_EXTRA2 | `10:11` | extends RT (R\*\_EXTRA2 Encoding)   |
  70 | Rsrc1\_EXTRA2 | `12:13` | extends RA (R\*\_EXTRA2 Encoding)   |
  71 | Rsrc2\_EXTRA2 | `14:15` | extends RB (R\*\_EXTRA2 Encoding)   |
  72 | Rsrc3\_EXTRA2 | `16:17` | extends RC (R\*\_EXTRA2 Encoding)   |
  73 | EXTRA2_MODE   | `18`    | used by `madded` for determining RS |
  74
  75 When `EXTRA2_MODE` is set to zero, the implicit RS register takes
  76 its Vector/Scalar setting from Rdest_EXTRA2, and takes
  77 the register number from RT, but all numbering
  78 is offset by VL. *Note that element-width overrides influence this
  79 offset* (see SVP64 [[svp64/appendix]] for full details).
  80
  81 When `EXTRA2_MODE` is set to one, the implicit RS register is identical
  82 to RC extended to SVP64 numbering, including whether RC is set Scalar or
  83 Vector.
  84
  85 # divrem2du RT,RA,RB,RC
  86
  87 **DRAFT**
  88
  89 Divide/Modulu Quad-Double Unsigned is another VA-Form instruction
  90 that is near-identical to `divdeu` except that:
  91
  92 * the lower 64 bits of the dividend, instead of being zero, contain a
  93   register, RC.
  94 * it performs a fused divide and modulo in a single instruction, storing
  95   the modulo in an implicit RS (similar to `madded`)
  96
  97 RB, the divisor, remains 64 bit.  The instruction is therefore a 128/64
  98 division, producing a (pair) of 64 bit result(s).  Overflow conditions
  99 are detected in exactly the same fashion as `divdeu`, except that rather
 100 than have `UNDEFINED` behaviour, RT is set to all ones and RS set to all
 101 zeros.
 102
 103 For SVP64, given that this instruction is also 3-in 2-out 64-bit registers,
 104 the exact same EXTRA format and setting of RS is used as for `sv.madded`.
 105 For Scalar usage, just as for `madded`, `RS=RT+1` (similar to `lq` and `stq`).
 106
 107 Pseudo-code:
 108
 109     if ((RA) <u (RB)) & ((RB) != [0]*XLEN) then
 110         dividend[0:(XLEN*2)-1] <- (RA) || (RC)
 111         divisor[0:(XLEN*2)-1] <- [0]*XLEN || (RB)
 112         result <- dividend / divisor
 113         modulo <- dividend % divisor
 114         RT <- result[XLEN:(XLEN*2)-1]
 115         RS <- modulo[XLEN:(XLEN*2)-1]
 116         overflow <- 0
 117     else
 118         overflow <- 1
 119         RT <- [1]*XLEN
 120         RS <- [0]*XLEN
 121
 122
 123 # [DRAFT] EXT04 Proposed Map
 124
 125 For the Opcode map (XO Field)
 126 see Power ISA v3.1, Book III, Appendix D, Table 13 (sheet 7 of 8), p1357.
 127 Proposed is the addition of `madded` (**DRAFT, NOT APPROVED**) in `110010`
 128 and `divrem2du` in `110100`
 129
 130 |110000|110001 |110010    |110011|110100       |110101|110110|110111|
 131 |------|-------|----------|------|-------------|------|------|------|
 132 |maddhd|maddhdu|**madded**|maddld|**divrem2du**|rsvd  |rsvd  |rsvd  |
 133