the boundaries of the number of registers needed, so as an
exercise is left for another time.
+# Conclusion
+
+Where a normal SIMD ISA requires explicit hand-crafted optimisation
+in order to achieve full utilisation of the underlying hardware,
+Simple-V instead can rely to a large extent on standard Multi-Issue
+hardware to achieve similar performance, whilst crucially keeping the
+algorithm implementation down to a shockingly-simple degree that makes
+it easy to understand an easy to review. Again also as with many
+other algorithms when implemented in Simple-V SVP54, by keeping to
+a LOAD-COMPUTE-STORE paradigm the L1 Data Cache usage is minimised,
+and in this case just as with chacha20 the entire algorithm, being
+only 9 lines of assembler fitting into 13 4-byte words it can fit
+into a single L1 I-Cache Line without triggering Virtual Memory TLB
+misses.
+
+Further performance improvements are achievable by using REMAP
+Parallel Reduction, still fitting into a single L1 Cache line,
+but beginning to approach the limit of the 128-long register file.
+
[[!tag svp64_cookbook ]]