(no commit message)

author lkcl <lkcl@web>

Mon, 25 Apr 2022 15:15:44 +0000 (16:15 +0100)

committer IkiWiki <ikiwiki.info>

Mon, 25 Apr 2022 15:15:44 +0000 (16:15 +0100)
author lkcl <lkcl@web>
Mon, 25 Apr 2022 15:15:44 +0000 (16:15 +0100)
committer IkiWiki <ikiwiki.info>
Mon, 25 Apr 2022 15:15:44 +0000 (16:15 +0100)
diff --git a/openpower/sv/biginteger/analysis.mdwn b/openpower/sv/biginteger/analysis.mdwn

index e5723d02a679311ebf92df4c78ae0129f3696be0..8e7d6c0ed06072631147114ac91477bf7984ed3b 100644 (file)
--- a/openpower/sv/biginteger/analysis.mdwn
+++ b/openpower/sv/biginteger/analysis.mdwn
@@ -303,8 +303,9 @@ Detection of the fixup (phase 3) is determined by the Carry (borrow)
  bit at the end. Logically: if borrow was required then the qhat estimate
  was too large and the correction is required, which is, again,
  nothing more than a Vectorised big-integer add (one instruction).
+However this is not the full story
  
-# Scalar 128-bit divisor
+**128/64-bit divisor**
  
  As mentioned above, the first part of the Knuth Algorithm D involves
  computing an estimate for the divisor. This involves using the three
@@ -345,8 +346,8 @@ The irony is, therefore, that attempting to
  improve big-integer divide by moving to 64-bit digits in order to take
  advantage of the efficiency of 64-bit scalar multiply when Vectorised
  would instead
-lock up CPU time performing a 128/64 division.  With the Vector Multiply
-operations being critically dependent on that `qhat` estimate, and
+lock up CPU time performing a 128/64 scalar division.  With the Vector 
+Multiply operations being critically dependent on that `qhat` estimate, and
  because that scalar is as an input into each of the vector digit
  multiples, as a Dependency Hazard it would cause *all* Parallel
  SIMD Multiply back-ends to sit 100% idle, waiting for that one scalar value.
@@ -355,8 +356,7 @@ Whilst one solution is to reduce the digit width to 32-bit in order to
  go back to 64/32 divide, this increases the completion time by a factor
  of 4 due to the algorithm being `O(N^2)`.
  
-*(TODO continue from here: reference investigation into using Goldschmidt 
-division with fixed-width numbers, to get number of iterations down)*
+**Reducing completion time of 128/64-bit Scalar division**
  
  Scalar division is a known computer science problem because, as even the
  Big-Int Divide shows, it requires looping around a multiply (or, if reduced
@@ -370,6 +370,8 @@ In this way a Scalar Integer divide can be performed in the same
  time-order as Newton-Raphson, using two hardware multipliers
  and a subtract.
  
+**Back to Vector carry-looping**
+
  There is however another reason for having a 128/64 division
  instruction, and it's effectively the reverse of `madded`.
  Look closely at Algorithm D when the divisor is only a scalar
author	lkcl <lkcl@web>
	Mon, 25 Apr 2022 15:15:44 +0000 (16:15 +0100)
committer	IkiWiki <ikiwiki.info>
	Mon, 25 Apr 2022 15:15:44 +0000 (16:15 +0100)