(no commit message)

author lkcl <lkcl@web>

Fri, 22 Apr 2022 13:18:13 +0000 (14:18 +0100)

committer IkiWiki <ikiwiki.info>

Fri, 22 Apr 2022 13:18:13 +0000 (14:18 +0100)
author lkcl <lkcl@web>
Fri, 22 Apr 2022 13:18:13 +0000 (14:18 +0100)
committer IkiWiki <ikiwiki.info>
Fri, 22 Apr 2022 13:18:13 +0000 (14:18 +0100)
diff --git a/openpower/sv/biginteger/analysis.mdwn b/openpower/sv/biginteger/analysis.mdwn

index af7cb91f0e159be481b1a6a2b8b4ad62211a3651..55652bdbe97fd0513cd0288036b172953422f923 100644 (file)
--- a/openpower/sv/biginteger/analysis.mdwn
+++ b/openpower/sv/biginteger/analysis.mdwn
@@ -25,7 +25,7 @@ Links
  * <https://www.reddit.com/r/OpenPOWER/comments/u8r4vf/draft_svp64_biginteger_vector_arithmetic_for_the/>
  * <https://bugs.libre-soc.org/show_bug.cgi?id=817>
  
-# Add and Subtract
+# Vector Add and Subtract
  
  Surprisingly, no new additional instructions are required to perform
  a straightforward big-integer add or subtract.  Vectorised `adde`
@@ -93,7 +93,7 @@ to people unfamiliar with Cray-style Vectors: if VL is not
  permitted to exceed 1 (because MAXVL is set to 1) then the above
  actually becomes a Scalar Big-Int add algorithm.
  
-# Multiply
+# Vector Multiply
  
  Long-multiply, assuming an O(N^2) algorithm, is performed by summing
  NxN separate smaller multiplications together.  Karatsuba's algorithm
@@ -244,7 +244,7 @@ would allow that same Vector of HI halves to not be an overwrite of RC.
  Also it is possible to specify that any of RA, RB or RC are scalar or
  vector. Overall it is extremely powerful.
  
-# Divide
+# Vector Divide
  
  The simplest implementation of big-int divide is the standard schoolbook
  "Long Division", set with RADIX 64 instead of Base 10. Donald Knuth's
@@ -303,7 +303,7 @@ bit at the end. Logically: if borrow was required then the qhat estimate
  was too large and the correction is required, which is, again,
  nothing more than a Vectorised big-integer add (one instruction).
  
-# 128-bit Scalar divisor
+# Scalar 128-bit divisor
  
  As mentioned above, the first part of the Knuth Algorithm D involves
  computing an estimate for the divisor. This involves using the three
@@ -332,19 +332,23 @@ cover a 64/32 operation (64-bit dividend, 32-bit divisor):
  
  However when moving to 64-bit digits (desirable because the algorithm
  is `O(N^2)`) this in turn means that the estimate has to be computed
-from a *128* bit dividend and a 64-bit divisor.  This operation does
-not exist in most Scalar 64-bit ISAs, and some investigation into
+from a *128* bit dividend and a 64-bit divisor.  Such an operation 
+simply does not exist in most Scalar 64-bit ISAs.  For Power ISA
+it would be necessary to implement Packed SIMD instructions
+and infrastructure in order to utilise `vdivuq` which is a 128/128
+(quad) divide, not a 128/64.  Some investigation into
  soft-implementations of 128/128 or 128/64 divide show it to be typically
-implemented bit-wise.
+implemented bit-wise, with all that implies.
  
  The irony is, therefore, that attempting to
  improve big-integer divide by moving to 64-bit digits in order to take
-advantage of the efficiency of 64-bit scalar multiply would instead
+advantage of the efficiency of 64-bit scalar multiply when Vectorised
+would instead
  lock up CPU time performing a 128/64 division.  With the Vector Multiply
  operations being critically dependent on that `qhat` estimate, and
  because that scalar is as an input into each of the vector digit
-multiples, as a Dependency Hazard it would have the Parallel SIMD Multiply
-back-ends sitting 100% idle, waiting for that one scalar value.
+multiples, as a Dependency Hazard it would cause *all* Parallel
+SIMD Multiply back-ends to sit 100% idle, waiting for that one scalar value.
  
  Whilst one solution is to reduce the digit width to 32-bit in order to
  go back to 64/32 divide, this increases the completion time by a factor
author	lkcl <lkcl@web>
	Fri, 22 Apr 2022 13:18:13 +0000 (14:18 +0100)
committer	IkiWiki <ikiwiki.info>
	Fri, 22 Apr 2022 13:18:13 +0000 (14:18 +0100)