From b69877dcf6588120097d1227ad74e0fdcbf4923a Mon Sep 17 00:00:00 2001 From: lkcl Date: Mon, 2 May 2022 00:13:36 +0100 Subject: [PATCH] --- openpower/sv/biginteger/analysis.mdwn | 29 +++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/openpower/sv/biginteger/analysis.mdwn b/openpower/sv/biginteger/analysis.mdwn index 3e59236ae..2523bf377 100644 --- a/openpower/sv/biginteger/analysis.mdwn +++ b/openpower/sv/biginteger/analysis.mdwn @@ -529,6 +529,35 @@ subsequent iterations the new k, being the modulo, is always less than the divisor as well. Thus the condition (the loop invariant) `RC < RA` is preserved, as long as RC starts at zero. +**Limitations** + +One of the worst things for any ISA is that an algorithm's completion +time is directly affected by different implementations having instructions +take longer or shorter times. Knuth's Big-Integer division is unfortunately +one such algorithm, where Big-Integer Goldschmidt divide is not. + +Assuming that the computation of qhat takes 128 cycles to complete on +a small power-efficient embedded design, this time would dominate +compared to the 64 bit multiplications. However if the element width +was reduced to 8, such that the computation of qhat only took 16 cycles, +the calculation of qhat would not dominate, but the number of +multiplications would rise: somewhere in between there would be an +elwidth and a Vector Length that would suit that particular embedded +processor. + +By contrast a high performance microarchitecture may deploy Goldschmidt +or other efficient Scalar Division, which could complete 128/64 qhat +computation in say only 5 to 8 cycles, which would be tolerable. +Thus, for general-purpose software, it would be necessary to ship +multiple implementations of the same algorithm and dynamically +select the best one. + +The very fact that programmers even have to consider multiple +implementations and compare their performance is an unavoidable nuisance. +SVP64 is supposed to be designed such that only one implementation of +any given algorithm is needed. In some ways it is reassuring that +some algorithms just don't fit. + # Conclusion TODO -- 2.30.2