(no commit message)

author lkcl <lkcl@web>

Sun, 8 May 2022 12:36:24 +0000 (13:36 +0100)

committer IkiWiki <ikiwiki.info>

Sun, 8 May 2022 12:36:24 +0000 (13:36 +0100)
author lkcl <lkcl@web>
Sun, 8 May 2022 12:36:24 +0000 (13:36 +0100)
committer IkiWiki <ikiwiki.info>
Sun, 8 May 2022 12:36:24 +0000 (13:36 +0100)
diff --git a/openpower/sv/SimpleV_rationale.mdwn b/openpower/sv/SimpleV_rationale.mdwn

index 1422cbbbd8c464ef91f75feb9408c611c4ed15c0..36e20150b44d54a5968c6456201feae2bf68036b 100644 (file)
--- a/openpower/sv/SimpleV_rationale.mdwn
+++ b/openpower/sv/SimpleV_rationale.mdwn
@@ -804,6 +804,16 @@ Intel, ARM, MIPS, Power ISA and RISC-V have all already said "yes" on that,
  for several decades, and advanced programmers are comfortable with the
  practice.
  
+Additional questions remain as to whether OpenCAPI or its use for this
+particular scenario requires that the PEs, even quite basic ones,
+implement a full RADIX MMU, and associated TLB lookup? In order to ensure
+that programs may be cleanly and seamlessly transferred between PEs
+and CPU the answer is quite likely to be "yes", which is interesting
+in and of itself.  Fortunately, the associated L1 Cache with TLB
+Translation does not have to be large, and the actual RADIX Tree Walk
+need not explicitly be done by the PEs, it can be handled by the main
+CPU as a software-extension.
+
  **Use-case: Matrix and Convolutions**
  
  Imagine a large Matrix scenario, with several values close to zero that
@@ -838,6 +848,22 @@ main CPU.  In this way a large Sparse Matrix Multiply or Convolution
  may be achieved without having to pass unnecessary data through
  L1/L2/L3 Caches only to find, at the CPU, that it is zero.
  
+**Use-case: More powerful PEs in-memory**
+
+An obvious variant of the above is that, if there is inherently
+more parallelism in the data set, then the PEs get their own
+Multiply-and-Accumulate instruction, and rather than send the
+data to the CPU over OpenCAPI, perform the Matrix-Multiply
+directly themselves.
+
+However the source code and binary would be near-identical if
+not identical in every respect, and the PEs implementing the full
+ZOLC capability in order to compact binary size to the bare minimum.
+
+One key strategic question does remain: do the PEs need to have
+a RADIX MMU and associated TLB-aware minimal L1 Cache, in order
+to support OpenCAPI properly?
+
  **Roadmap summary of Advanced SVP64**
  
  The future direction for SVP64, then, is:
author	lkcl <lkcl@web>
	Sun, 8 May 2022 12:36:24 +0000 (13:36 +0100)
committer	IkiWiki <ikiwiki.info>
	Sun, 8 May 2022 12:36:24 +0000 (13:36 +0100)