(no commit message)

author lkcl <lkcl@web>

Sun, 1 May 2022 23:13:36 +0000 (00:13 +0100)

committer IkiWiki <ikiwiki.info>

Sun, 1 May 2022 23:13:36 +0000 (00:13 +0100)
author lkcl <lkcl@web>
Sun, 1 May 2022 23:13:36 +0000 (00:13 +0100)
committer IkiWiki <ikiwiki.info>
Sun, 1 May 2022 23:13:36 +0000 (00:13 +0100)
diff --git a/openpower/sv/biginteger/analysis.mdwn b/openpower/sv/biginteger/analysis.mdwn

index 3e59236ae0997593596295024ba3b197dae83427..2523bf377bede6964aa0018acd80672c5ebef79d 100644 (file)
--- a/openpower/sv/biginteger/analysis.mdwn
+++ b/openpower/sv/biginteger/analysis.mdwn
@@ -529,6 +529,35 @@ subsequent iterations the new k, being the modulo, is always less than the
  divisor as well. Thus the condition (the loop invariant) `RC < RA`
  is preserved, as long as RC starts at zero.
  
+**Limitations**
+
+One of the worst things for any ISA is that an algorithm's completion
+time is directly affected by different implementations having instructions
+take longer or shorter times.  Knuth's Big-Integer division is unfortunately
+one such algorithm, where Big-Integer Goldschmidt divide is not.
+
+Assuming that the computation of qhat takes 128 cycles to complete on
+a small power-efficient embedded design, this time would dominate
+compared to the 64 bit multiplications.  However if the element width
+was reduced to 8, such that the computation of qhat only took 16 cycles,
+the calculation of qhat would not dominate, but the number of
+multiplications would rise: somewhere in between there would be an
+elwidth and a Vector Length that would suit that particular embedded
+processor.
+
+By contrast a high performance microarchitecture may deploy Goldschmidt
+or other efficient Scalar Division, which could complete 128/64 qhat
+computation in say only 5 to 8 cycles, which would be tolerable.
+Thus, for general-purpose software, it would be necessary to ship
+multiple implementations of the same algorithm and dynamically
+select the best one.
+
+The very fact that programmers even have to consider multiple
+implementations and compare their performance is an unavoidable nuisance.
+SVP64 is supposed to be designed such that only one implementation of
+any given algorithm is needed. In some ways it is reassuring that
+some algorithms just don't fit.
+
  # Conclusion
  
  TODO
author	lkcl <lkcl@web>
	Sun, 1 May 2022 23:13:36 +0000 (00:13 +0100)
committer	IkiWiki <ikiwiki.info>
	Sun, 1 May 2022 23:13:36 +0000 (00:13 +0100)