This page covers an analysis of big integer operations, to
work out optimal Scalar Instructions to propose be submitted to
the OpenPOWER ISA WG, that when combined with Draft SVP64 give
-high performance compact Big Integer Vector Arithmetic.
+high performance compact Big Integer Vector Arithmetic. Leverage
+of existing Scalar Power ISA instructions is also explained.
Use of smaller sub-operations is a given: worst-case in a Scalar
context, addition is O(N) whilst multiply and divide are O(N^2),
level in back-end hardware that need only:
* read the first incoming XER.CA
-* implement a Vector-aware carry propagation algorithm
+* implement a large Vector-aware carry propagation algorithm
* store the very last XER.CA in the batch
The size and implementation of the underlying back-end SIMD ALU
-is entirely at the discretion of the implementer.
+is entirely at the discretion of the implementer, as is whether to
+deploy the above strategy. The only hard requirement for
+implementors of SVP64 is to comply with strict and precise Program Order
+even at the Element level.
If there is pressure on the register file (or
multi-million-digit big integers)
With effectively 5 operands (3 in, 2 out) some compromises are needed.
A little thought gives a useful workaround: two modes,
controlled by a single bit in `RM.EXTRA`, determine whether the 5th
-register is set to RC or whether to RT+VL. This then leaves only
+register is set to RC or whether to RT+MAXVL. This then leaves only
4 registers to qualify as scalar/vector, which can use four
EXTRA2 designators and fits into the available 9-bit space.
-RS=RT+VL Mode:
+RS=RT+MAXVL Mode:
product = RA*RB+RC
RT = lowerhalf(product)
- RS=RT+VL = upperhalf(product)
+ RS=RT+MAXVL = upperhalf(product)
and RS=RC Mode:
Now there is much more potential, including setting RC to a Scalar,
which would be useful as a 64 bit Carry. RC as a Vector would produce
-a Vector of the HI halves of a Vector of multiplies. RS=RT+VL Mode
+a Vector of the HI halves of a Vector of multiplies. RS=RT+MAXVL Mode
would allow that same Vector of HI halves to not be an overwrite of RC.
Also it is possible to specify that any of RA, RB or RC are scalar or
vector. Overall it is extremely powerful.
Again, in an SVP64 context, using EXTRA mode bit 8 allows for
selecting whether `RS=RC` or
-`RS=RT+VL`. Similar flexibility in the scalar-vector settings
+`RS=RT+MAXVL`. Similar flexibility in the scalar-vector settings
allows the instruction to perform full parallel vector div/mod,
or act in loop-back mode for big-int division by a scalar,
or for a single scalar 128/64 div/mod.