to achieve similar performance, whilst crucially keeping the algorithm
implementation down to a shockingly-simple degree that makes it easy to
understand an easy to review. Again also as with many other algorithms
-when implemented in Simple-V SVP54, by keeping to a LOAD-COMPUTE-STORE
+when implemented in Simple-V SVP64, by keeping to a LOAD-COMPUTE-STORE
paradigm the L1 Data Cache usage is minimised, and in this case just
as with chacha20 the entire algorithm, being only 9 lines of assembler
fitting into 13 4-byte words it can fit into a single L1 I-Cache Line