From 483acaa68ff558f056433417c79968931c7c6e92 Mon Sep 17 00:00:00 2001 From: Luke Kenneth Casson Leighton Date: Wed, 6 Dec 2023 14:39:47 +0000 Subject: [PATCH] add pospopcount conclusion bug #672 --- openpower/sv/cookbook/pospopcnt.mdwn | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/openpower/sv/cookbook/pospopcnt.mdwn b/openpower/sv/cookbook/pospopcnt.mdwn index cec72685f..d3db8bb42 100644 --- a/openpower/sv/cookbook/pospopcnt.mdwn +++ b/openpower/sv/cookbook/pospopcnt.mdwn @@ -226,4 +226,23 @@ of straight accumulators `r16-r23`. However this starts to push the boundaries of the number of registers needed, so as an exercise is left for another time. +# Conclusion + +Where a normal SIMD ISA requires explicit hand-crafted optimisation +in order to achieve full utilisation of the underlying hardware, +Simple-V instead can rely to a large extent on standard Multi-Issue +hardware to achieve similar performance, whilst crucially keeping the +algorithm implementation down to a shockingly-simple degree that makes +it easy to understand an easy to review. Again also as with many +other algorithms when implemented in Simple-V SVP54, by keeping to +a LOAD-COMPUTE-STORE paradigm the L1 Data Cache usage is minimised, +and in this case just as with chacha20 the entire algorithm, being +only 9 lines of assembler fitting into 13 4-byte words it can fit +into a single L1 I-Cache Line without triggering Virtual Memory TLB +misses. + +Further performance improvements are achievable by using REMAP +Parallel Reduction, still fitting into a single L1 Cache line, +but beginning to approach the limit of the 128-long register file. + [[!tag svp64_cookbook ]] -- 2.30.2