(no commit message)
[libreriscv.git] / openpower / sv / biginteger.mdwn
1 [[!tag standards]]
2
3 # Big Integer Arithmetic
4
5 **DRAFT STATUS** 19apr2021
6
7 BigNum arithmetic is extremely common especially in cryptography,
8 where for example RSA relies on arithmetic of 2048 or 4096 bits
9 in length. The primary operations are add, multiply and divide
10 (and modulo) with specialisations of subtract and signed multiply.
11
12 A reminder that a particular focus of SVP64 is that it is built on
13 top of Scalar operations, where those scalar operations are useful in
14 their own right without SVP64. Thus the operstions here are proposed
15 first as Scalar Extensions to the Power ISA.
16
17 A secondary focus is that if Vectorised, implementors may choose
18 to deploy macro-op fusion targetting back-end 256-bit or greater
19 Dynamic SIMD ALUs for maximum performance and effectiveness.
20
21 # Analysis
22
23 Covered in [[biginteger/analysis]] the summary is that standard `adde` is sufficient
24 for SVP64 Vectorisation of big-integer addition (and subfe for
25 subtraction) but that big-integer multiply and divide
26 require two extra 3-in 2-out instructions, similar to Intel's `mulx`,
27 to be efficient. Macro-op Fusion and back-end massively-wide SIMD ALUs
28 may be deployed in a fashion that is hidden from the user, behind a
29 consistent, stable ISA API.
30
31 # Instructions
32
33 **DRAFT**
34
35 Both `madded` and `msubed` are VA-Form:
36
37 |0.....5|6..10|11..15|16..20|21..25|26..31|
38 |-------|-----|------|------|------|------|
39 | EXT04 | RT | RA | RB | RC | XO |
40
41 For the Opcode map (XO Field)
42 see Power ISA v3.1, Book III, Appendix D, Table 13 (sheet 7 of 8), p1357.
43 Proposed is the addition of `msubed` (**DRAFT, NOT APPROVED**) which is
44 in `110110`. A corresponding `madded` is proposed for `110010`
45
46 | 110000 | 110001 | 110010 | 110011 | 110100 | 110101 | 110110 | 110111 |
47 | ------ | ------- | ------ | ------ | ------ | ------ | ------ | ------ |
48 | maddhd | maddhdu | madded | maddld | rsvd | rsvd | msubed | rsvd |
49
50 For SVP64 EXTRA register extension, the `RM-1P-3S-1D` format is
51 used with the additional bit set for determining RS.
52
53 | Field Name | Field bits | Description |
54 |------------|------------|----------------------------------------|
55 | Rdest\_EXTRA2 | `10:11` | extends RT (R\*\_EXTRA2 Encoding) |
56 | Rsrc1\_EXTRA2 | `12:13` | extends RA (R\*\_EXTRA2 Encoding) |
57 | Rsrc2\_EXTRA2 | `14:15` | extends RB (R\*\_EXTRA2 Encoding) |
58 | Rsrc3\_EXTRA2 | `16:17` | extends RC (R\*\_EXTRA2 Encoding) |
59 | EXTRA2_MODE | `18` | used by `msubed` and `madded` for RS |
60
61 When `EXTRA2_MODE` is set to zero, the implicit RS register takes
62 its Vector/Scalar setting from Rdest_EXTRA2, and takes
63 the register number from RT, but all numbering
64 is offset by VL. *Note that element-width overrides influence this
65 offset* (see SVP64 [[svp64/appendix]] for full details).
66
67 When `EXTRA2_MODE` is set to one, the implicit RS register is identical
68 to RC extended to SVP64 numbering, including whether RC is set Scalar or
69 Vector.
70
71 ## msubed
72
73 The pseudocode for `msubed RT, RA, RB, RC`` is:
74
75 prod[0:127] = (RA) * (RB)
76 sub[0:127] = EXTZ(RC) - prod
77 RT <- sub[64:127]
78 RS <- sub[0:63] # RS is either RC or RT+VL
79
80 Note that RC is not sign-extended to 64-bit. In a Vector Loop
81 it contains the top half of the previous multiply-with-subtract,
82 and the current product must be subtracted from it.
83
84 ## madded
85
86 The pseudocode for `madded RT, RA, RB, RC` is:
87
88 prod[0:127] = (RA) * (RB)
89 sum[0:127] = EXTZ(RC) + prod
90 RT <- sum[64:127]
91 RS <- sum[0:63] # RS is either RC or RT+VL
92
93 Again RC is zero-extended (not shifted), the 128-bit product added
94 to it; the lower half of the result stored in RT and the upper half
95 in RS.
96
97 The differences here to `maddhdu` are that `maddhdu` stores the upper
98 half in RT, where `madded` stores the upper half in RS. There is no
99 equivalent to `maddld` because `maddld` performs sign-extension on RC.
100
101 # Appendix
102
103 see [[appendix]]