1 # RFC ls003 Big Integer

3 **URLs**:

5 * <https://libre-soc.org/openpower/sv/biginteger/analysis/>

6 * <https://libre-soc.org/openpower/sv/rfc/ls003/>

7 * <https://bugs.libre-soc.org/show_bug.cgi?id=960>

8 * <https://git.openpower.foundation/isa/PowerISA/issues/91>

10 **Severity**: Major

12 **Status**: New

14 **Date**: 20 Oct 2022

16 **Target**: v3.2B

18 **Source**: v3.0B

20 **Books and Section affected**: **UPDATE**

22 ```

23 Book I 64-bit Fixed-Point Arithmetic Instructions 3.3.9.1

24 Appendix E Power ISA sorted by opcode

25 Appendix F Power ISA sorted by version

26 Appendix G Power ISA sorted by Compliancy Subset

27 Appendix H Power ISA sorted by mnemonic

28 ```

30 **Summary**

32 ```

33 Instructions added

34 maddedu - Multiply-Add Extended Double Unsigned

35 divmod2du - Divide/Modulo Quad-Double Unsigned

36 ```

38 **Submitter**: Luke Leighton (Libre-SOC)

40 **Requester**: Libre-SOC

42 **Impact on processor**:

44 ```

45 Addition of two new GPR-based instructions

46 ```

48 **Impact on software**:

50 ```

51 Requires support for new instructions in assembler, debuggers,

52 and related tools.

53 ```

55 **Keywords**:

57 ```

58 GPR, Big-integer, Double-word

59 ```

61 **Motivation**

63 Similar to `maddhdu` and `maddld`, but allow for a big-integer rolling

64 accumulation affect: `RC` effectively becomes a 64-bit carry in chains

65 of highly-efficient loop-unrolled arbitrary-length big-integer operations.

66 Similar to `divdeu`, and has similar advantages to `maddedu`,

67 Modulo result is available with the quotient in a single instruction

68 allowing highly-efficient arbitrary-length big-integer division.

70 **Notes and Observations**:

72 1. It is not practical to add Rc=1 variants as VA-Form is used and

73 there is a **pair** of results produced.

74 2. An overflow variant (XER.OV set) of `divmod2du` would be valuable

75 but VA-Form EXT004 is under severe pressure.

76 3. Both instructions have been present in Intel x86 for several decades.

77 4. Neither instruction is present in VSX: these are 128/64 whereas

78 VSX is 128/128.

79 5. `maddedu` and `divmod2du` are full inverses of each other, including

80 when used for arbitrary-length big-integer arithmetic

81 6. These are both 3-in 2-out instructions. If Power ISA did not already

82 have LD/ST-with-update instructions and instructions with `RAp`

83 and `RTp` then these instructions would not be proposed.

85 **Changes**

87 Add the following entries to:

89 * the Appendices of Book I

90 * Instructions of Book I added to Section 3.3.9.1

92 ----------------

94 \newpage{}

96 # Multiply-Add Extended Double Unsigned

98 `maddedu RT, RA, RB, RC`

100 | 0-5 | 6-10 | 11-15 | 16-20 | 21-25 | 26-31 | Form |

101 |-------|------|-------|-------|-------|-------|---------|

102 | EXT04 | RT | RA | RB | RC | XO | VA-Form |

104 Pseudocode:

106 ```

107 prod[0:127] <- (RA) * (RB) # Multiply RA and RB, result 128-bit

108 sum[0:127] <- EXTZ(RC) + prod # Zero extend RC, add product

109 RT <- sum[64:127] # Store low half in RT

110 RS <- sum[0:63] # RS implicit register, equal to RC

111 ```

113 Special registers altered:

115 None

117 RC is zero-extended (not shifted, not sign-extended), the 128-bit product added

118 to it; the lower half of that result stored in RT and the upper half

119 in RS.

121 The differences here to `maddhdu` are that `maddhdu` stores the upper

122 half in RT, where `maddedu` stores the upper half in RS.

124 The value stored in RT is exactly equivalent to `maddld` despite `maddld`

125 performing sign-extension on RC, because RT is the full mathematical result

126 modulo 2^64 and sign/zero extension from 64 to 128 bits produces identical

127 results modulo 2^64. This is why there is no maddldu instruction.

129 RS is implictly defined as the same register as RC.

131 *Programmer's Note:

132 As a Scalar Power ISA operation, like `lq` and `stq`, RS=RT+1.

133 To achieve a big-integer rolling-accumulation effect:

134 assuming the scalar to multiply is in r0,

135 the vector to multiply by starts at r4 and the result vector

136 in r20, instructions may be issued `maddedu r20,r4,r0,r20

137 maddedu r21,r5,r0,r21` etc. where the first `maddedu` will have

138 stored the upper half of the 128-bit multiply into r21, such

139 that it may be picked up by the second `maddedu`. Repeat inline

140 to construct a larger bigint scalar-vector multiply,

141 as Scalar GPR register file space permits.*

143 Examples:

145 ```

146 # (r0 * r1) + r2, store lower in r4, upper in r2

147 maddedu r4, r0, r1, r2

148 ```

150 ----------

152 \newpage{}

154 # Divide/Modulo Quad-Double Unsigned

156 **Should name be Divide/Module Double Extended Unsigned?**

157 **Check the pseudo-code comments**

159 `divmod2du RT,RA,RB,RC`

161 | 0-5 | 6-10 | 11-15 | 16-20 | 21-25 | 26-31 | Form |

162 |-------|------|-------|-------|-------|-------|---------|

163 | EXT04 | RT | RA | RB | RC | XO | VA-Form |

165 Pseudo-code:

167 ```

168 if ((RA) <u (RB)) & ((RB) != [0]*XLEN) then # Check RA<RB, for divide-by-0

169 dividend[0:(XLEN*2)-1] <- (RA) || (RC) # Combine RA/RC, zero extend

170 divisor[0:(XLEN*2)-1] <- [0]*XLEN || (RB) # Extend to 128-bit

171 result <- dividend / divisor # Division

172 modulo <- dividend % divisor # Modulo

173 RT <- result[XLEN:(XLEN*2)-1] # Store result in RT

174 RS <- modulo[XLEN:(XLEN*2)-1] # Modulo in RC, implicit

175 else # In case of error

176 RT <- [1]*XLEN # RT all 1's

177 RS <- [0]*XLEN # RS all 0's

178 ```

180 Special registers altered:

182 None

184 Divide/Modulo Quad-Double Unsigned is another VA-Form instruction

185 that is near-identical to `divdeu` except that:

187 * the lower 64 bits of the dividend, instead of being zero, contain a

188 register, RC.

189 * it performs a fused divide and modulo in a single instruction, storing

190 the modulo in an implicit RS (similar to `maddedu`)

192 RB, the divisor, remains 64 bit. The instruction is therefore a 128/64

193 division, producing a (pair) of 64 bit result(s), in the same way that

194 Intel [divq](https://www.felixcloutier.com/x86/div) works.

195 Overflow conditions

196 are detected in exactly the same fashion as `divdeu`, except that rather

197 than have `UNDEFINED` behaviour, RT is set to all ones and RS set to all

198 zeros on overflow.

200 *Programmer's note: there are no Rc variants of any of these VA-Form

201 instructions. `cmpi` will need to be used to detect overflow conditions:

202 the saving in instruction count is that both RT and RS will have already

203 been set to useful values (all 1s and all zeros respectively)

204 needed as part of implementing Knuth's Algorithm D*

206 For Scalar usage, just as for `maddedu`, `RS=RC`

207 Examples:

209 ```

210 # ((r0 << 64) + r2) / r1, store in r4

211 # ((r0 << 64) + r2) % r1, store in r2

212 divmod2du r4, r0, r1, r2

213 ```

215 [[!tag opf_rfc]]

217 # Appendices

219 Appendix E Power ISA sorted by opcode

220 Appendix F Power ISA sorted by version

221 Appendix G Power ISA sorted by Compliancy Subset

222 Appendix H Power ISA sorted by mnemonic

224 | Form | Book | Page | Version | mnemonic | Description |

225 |------|------|------|---------|----------|-------------|

226 | VA | I | # | 3.0B | maddedu | Multiply-Add Extend Double Unsigned |

227 | VA | I | # | 3.0B | divmod2du | Floatif Move | Divide/Modulo Quad-Double Unsigned