From 1c361a3eb28daff34647be0c662493957044c307 Mon Sep 17 00:00:00 2001
From: lkcl <lkcl@web>
Date: Thu, 12 May 2022 12:49:16 +0100
Subject: [PATCH]

---
 openpower/sv/SimpleV_rationale.mdwn | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/openpower/sv/SimpleV_rationale.mdwn b/openpower/sv/SimpleV_rationale.mdwn
index 7b937d750..2e9f78e8e 100644
--- a/openpower/sv/SimpleV_rationale.mdwn
+++ b/openpower/sv/SimpleV_rationale.mdwn
@@ -987,7 +987,7 @@ to the average high-end CPU.
 
 The informed reader will have noted the remarkable similarity between how
 a CPU communicates with a GPU to schedule tasks, and the proposed
-architecture.  CPUs schedule tasks as follows:
+architecture.  CPUs schedule tasks with GPUs as follows:
 
 * User-space program encounters an OpenGL function, in the
   CPU's ISA.
@@ -995,7 +995,23 @@ architecture.  CPUs schedule tasks as follows:
   Shader Binary written in the GPU's ISA.
 * GPU Driver wishes to transfer both the data and the Shader Binary
   to the GPU. Both may only do so via Shared Memory, usually
-  DMA over PCIe.
+  DMA over PCIe (assuming a PCIe Graphics Card)
+* GPU Driver which has been running CPU userspace notifies CPU
+  kernelspace of the desire to transfer data and GPU Shader Binary
+  to the GPU. A context-switch occurs...
+
+It is almost unfair to burden the reader with further details.
+The extraordinarily convoluted procedure is as bad as it sounds. Hundreds
+of thousands of tasks per second are scheduled this way, with hundreds
+or megabytes of data per second being exchanged as well.
+
+Yet, the process is not that different from how things would work
+with the proposed microarchitecture: the differences however are key.
+
+* Both PEs and CPU run the exact same ISA.  A major complexity of 3D GPU
+  and CUDA workloads  (JIT compilation etc) is eliminated, and, crucially,
+  the CPU may execute the PE's tasks, if needed.
+
 
 **Roadmap summary of Advanced SVP64**
 
-- 
2.30.2