add clarification

author Luke Kenneth Casson Leighton <lkcl@lkcl.net>

Tue, 24 Apr 2018 07:51:53 +0000 (08:51 +0100)

committer Luke Kenneth Casson Leighton <lkcl@lkcl.net>

Tue, 24 Apr 2018 07:51:53 +0000 (08:51 +0100)
author Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Tue, 24 Apr 2018 07:51:53 +0000 (08:51 +0100)
committer Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Tue, 24 Apr 2018 07:51:53 +0000 (08:51 +0100)
diff --git a/simple_v_extension.mdwn b/simple_v_extension.mdwn

index 226c0b1a8ec819f22f97e17eff97e213b9e479f6..92ac117ede55c558cfffcc11fa68ca21dc6be069 100644 (file)
--- a/simple_v_extension.mdwn
+++ b/simple_v_extension.mdwn
@@ -52,10 +52,13 @@ details outlined in the Appendix), the key points being:
  * Vectorisation typically includes much more comprehensive memory load
    and store schemes (unit stride, constant-stride and indexed), which
    in turn have ramifications: virtual memory misses (TLB cache misses)
-  and even multiple page-faults... all caused by a *single instruction*.
+  and even multiple page-faults... all caused by a *single instruction*,
+  yet with a clear benefit that the regularisation of LOAD/STOREs can
+  be optimised for minimal impact on caches and maximised throughput.
  * By contrast, SIMD can use "standard" memory load/stores (32-bit aligned
    to pages), and these load/stores have absolutely nothing to do with the
-  SIMD / ALU engine, no matter how wide the operand.
+  SIMD / ALU engine, no matter how wide the operand.  Simplicity but with
+  more impact on instruction and data caches.
  
  Overall it makes a huge amount of sense to have a means and method
  of introducing instruction parallelism in a flexible way that provides
author	Luke Kenneth Casson Leighton <lkcl@lkcl.net>
	Tue, 24 Apr 2018 07:51:53 +0000 (08:51 +0100)
committer	Luke Kenneth Casson Leighton <lkcl@lkcl.net>
	Tue, 24 Apr 2018 07:51:53 +0000 (08:51 +0100)