(no commit message)

author lkcl <lkcl@web>

Thu, 21 Apr 2022 09:45:39 +0000 (10:45 +0100)

committer IkiWiki <ikiwiki.info>

Thu, 21 Apr 2022 09:45:39 +0000 (10:45 +0100)
author lkcl <lkcl@web>
Thu, 21 Apr 2022 09:45:39 +0000 (10:45 +0100)
committer IkiWiki <ikiwiki.info>
Thu, 21 Apr 2022 09:45:39 +0000 (10:45 +0100)
diff --git a/openpower/sv/biginteger/analysis.mdwn b/openpower/sv/biginteger/analysis.mdwn

index 52174220f250ec0e73979ec952598da233cb685f..cbb061ab802a0eea836051337d7f38ffb18cc69b 100644 (file)
--- a/openpower/sv/biginteger/analysis.mdwn
+++ b/openpower/sv/biginteger/analysis.mdwn
@@ -1,8 +1,9 @@
  # Analysis
  
-This section covers an analysis of big integer operations.  Use of
-smaller sub-operations is a given: worst-case, addition is O(N)
-whilst multiply and divide are O(N^2).
+This page covers an analysis of big integer operations, to
+work out an optimal Vector Instruction for addition to Draft
+SVP64.  Use of smaller sub-operations is a given: worst-case,
+addition is O(N) whilst multiply and divide are O(N^2).
  
  Links
  
@@ -125,15 +126,28 @@ purpose.
  Successive iterations effectively use RC as a 64-bit carry, and
  as noted by Intel in their notes on mulx,
  RA*RB+RC+RD cannot overflow, so does not require
-setting an additional CA flag.
-
-Combined with a Vectorised big-int `sv.adde` the key inner loop of
+setting an additional CA flag, we first cover the chain of
+RA*RB+RC as follows:
+
+    RT0, RC0 = RA0 * RB0 + RC
+          |
+          +----------------+
+                           |
+    RT1, RC1 = RA1 * RB1 + RC0
+          |
+          +----------------+
+                           |
+    RT2, RC2 = RA2 * RB2 + RC1
+
+Following up to add each partially-computed row to what will become
+the final result is achieved with a Vectorised big-int
+`sv.adde`. Thus, the key inner loop of
  Knuth's Algorithm M may be achieved in four instructions, two of
  which are scalar initialisation:
  
-    li r16, 0                     # zero accululator
-    addic r16, r16, 0             # CA to zero as well
+    li r16, 0                     # zero accumulator
      sv.madde r0.v, r8.v, r17, r16 # mul vector 
+    addic r16, r16, 0             # CA to zero as well
      sv.adde r24.v, r24.v, r0.v   # big-add row to result
      
  Normally, in a Scalar ISA, the use of a register as both a source
author	lkcl <lkcl@web>
	Thu, 21 Apr 2022 09:45:39 +0000 (10:45 +0100)
committer	IkiWiki <ikiwiki.info>
	Thu, 21 Apr 2022 09:45:39 +0000 (10:45 +0100)