The informed reader will have noted the remarkable similarity between how
a CPU communicates with a GPU to schedule tasks, and the proposed
-architecture. CPUs schedule tasks as follows:
+architecture. CPUs schedule tasks with GPUs as follows:
* User-space program encounters an OpenGL function, in the
CPU's ISA.
Shader Binary written in the GPU's ISA.
* GPU Driver wishes to transfer both the data and the Shader Binary
to the GPU. Both may only do so via Shared Memory, usually
- DMA over PCIe.
+ DMA over PCIe (assuming a PCIe Graphics Card)
+* GPU Driver which has been running CPU userspace notifies CPU
+ kernelspace of the desire to transfer data and GPU Shader Binary
+ to the GPU. A context-switch occurs...
+
+It is almost unfair to burden the reader with further details.
+The extraordinarily convoluted procedure is as bad as it sounds. Hundreds
+of thousands of tasks per second are scheduled this way, with hundreds
+or megabytes of data per second being exchanged as well.
+
+Yet, the process is not that different from how things would work
+with the proposed microarchitecture: the differences however are key.
+
+* Both PEs and CPU run the exact same ISA. A major complexity of 3D GPU
+ and CUDA workloads (JIT compilation etc) is eliminated, and, crucially,
+ the CPU may execute the PE's tasks, if needed.
+
**Roadmap summary of Advanced SVP64**