(no commit message)

author lkcl <lkcl@web>

Mon, 9 May 2022 10:08:47 +0000 (11:08 +0100)

committer IkiWiki <ikiwiki.info>

Mon, 9 May 2022 10:08:47 +0000 (11:08 +0100)
author lkcl <lkcl@web>
Mon, 9 May 2022 10:08:47 +0000 (11:08 +0100)
committer IkiWiki <ikiwiki.info>
Mon, 9 May 2022 10:08:47 +0000 (11:08 +0100)
diff --git a/openpower/sv/SimpleV_rationale.mdwn b/openpower/sv/SimpleV_rationale.mdwn

index 9fb2c90100b373a934b432cc7357c038504638ad..d2bfbb44a82164815d5fd8f5af4d7d9f151ec15b 100644 (file)
--- a/openpower/sv/SimpleV_rationale.mdwn
+++ b/openpower/sv/SimpleV_rationale.mdwn
@@ -855,6 +855,14 @@ combined with SVREMAP Matrix Schedules.  Imagine that SVREMAP has been
  extended, Snitch-style, to perform a deterministic memory-array walk of
  a large Matrix.
  
+*<blockquote>
+* **Horizontal-First**: (aka standard Cray Vectors) walk through
+  **elements** first before moving to next instruction
+* **Vertical-First**: walk through **instructions** before
+  moving to next element.  Currently managed by `svstep`,
+  ZOLC may be deployed to manage the stepping.
+</blockquote>*
+
  Let us also imagine that the Matrices are stored in Memory with PEs
  attached, and that the PEs are fully functioning Power ISA with Draft
  SVP64, but their Multiply capability is not as good as the main CPU.
@@ -884,7 +892,7 @@ L1/L2/L3 Caches only to find, at the CPU, that it is zero.
  The reason in this case for the use of Vertical-First Mode is the
  conditional execution of the Multiply-and-Accumulate.
  Horizontal-First Mode is the standard Cray-Style Vectorisation:
-loop on all elements with the same instruction before moving
+loop on all *elements* with the same instruction before moving
  on to the next instruction. Predication needs to be pre-calculated
  for the entire Vector in order to exclude certain elements from
  the computation. In this case, that's an expensive inconvenience 
@@ -930,7 +938,7 @@ a RADIX MMU and associated TLB-aware minimal L1 Cache, in order
  to support OpenCAPI properly? The answer is very likely to be yes.
  The saving grace here is that with
  the expectation of running only hot-loops with ZOLC-driven
-binaries, the size of each PE's
+binaries, the size of each PE's TLB-aware
  L1 Cache needed would be miniscule compared
  to the average high-end CPU.
author	lkcl <lkcl@web>
	Mon, 9 May 2022 10:08:47 +0000 (11:08 +0100)
committer	IkiWiki <ikiwiki.info>
	Mon, 9 May 2022 10:08:47 +0000 (11:08 +0100)