divisor as well. Thus the condition (the loop invariant) `RC < RA`
is preserved, as long as RC starts at zero.
+**Limitations**
+
+One of the worst things for any ISA is that an algorithm's completion
+time is directly affected by different implementations having instructions
+take longer or shorter times. Knuth's Big-Integer division is unfortunately
+one such algorithm, where Big-Integer Goldschmidt divide is not.
+
+Assuming that the computation of qhat takes 128 cycles to complete on
+a small power-efficient embedded design, this time would dominate
+compared to the 64 bit multiplications. However if the element width
+was reduced to 8, such that the computation of qhat only took 16 cycles,
+the calculation of qhat would not dominate, but the number of
+multiplications would rise: somewhere in between there would be an
+elwidth and a Vector Length that would suit that particular embedded
+processor.
+
+By contrast a high performance microarchitecture may deploy Goldschmidt
+or other efficient Scalar Division, which could complete 128/64 qhat
+computation in say only 5 to 8 cycles, which would be tolerable.
+Thus, for general-purpose software, it would be necessary to ship
+multiple implementations of the same algorithm and dynamically
+select the best one.
+
+The very fact that programmers even have to consider multiple
+implementations and compare their performance is an unavoidable nuisance.
+SVP64 is supposed to be designed such that only one implementation of
+any given algorithm is needed. In some ways it is reassuring that
+some algorithms just don't fit.
+
# Conclusion
TODO