From: Luke Kenneth Casson Leighton Date: Thu, 5 May 2022 16:41:44 +0000 (+0100) Subject: add GPU paragraph X-Git-Tag: opf_rfc_ls005_v1~2433 X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=b195d9580efdbc9cd3c61415161f0a4fec6576d4;p=libreriscv.git add GPU paragraph --- diff --git a/openpower/sv/SimpleV_rationale.mdwn b/openpower/sv/SimpleV_rationale.mdwn index 8323571fe..6e5d14a03 100644 --- a/openpower/sv/SimpleV_rationale.mdwn +++ b/openpower/sv/SimpleV_rationale.mdwn @@ -28,12 +28,14 @@ First hints are that whilst memory bitcells have not increased in speed since the 90s (around 150 mhz), increasing the datapath widths has allowed significant apparent speed increases: 3200 mhz DDR4 and even faster DDR5, and other advanced Memory interfaces such as HBM, Gen-Z, and OpenCAPI, -all make an effort, but these efforts are dwarfed by the two nearly -three orders of magnitude increase in CPU horsepower. Seymour Cray, -from his amazing in-depth knowledge, predicted that the mismatch would -become a serious limitation. Some systems at the time of writing are -approaching a *Gigabyte* of L4 Cache, by way of compensation, and as we -know from experience even that will be considered inadequate in future. +all make an effort (all simply increasing the parallel deployment of +the underlying 150 mhz bitcells), but these efforts are dwarfed by the +two nearly three orders of magnitude increase in CPU horsepower. Seymour +Cray, from his amazing in-depth knowledge, predicted that the mismatch +would become a serious limitation, over two decades ago. Some systems +at the time of writing are now approaching a *Gigabyte* of L4 Cache, +by way of compensation, and as we know from experience even that will +be considered inadequate in future. Efforts to solve this problem by moving the processing closer to or directly integrated into the memory have traditionally not gone well: @@ -48,6 +50,14 @@ for massive wide Baseband FFTs saved it from going under. Any "better AI mousetrap" that comes along will quickly render both D-Matrix and Graphcore obsolete. +NVIDIA and other GPUs have taken a different approach again: massive +parallelism with more Turing-complete ISAs in each, and dedicated +slower parallel memory paths (GDDR5) suited to the specific tasks of +3D, Parallel Compute and AI. The complexity of this approach is only dwarfed +by the amount of money poured into the software ecosystem in order +to make it accessible, and even then, GPU Programmers are a specialist +and rare (expensive) breed. + Second hints as to the answer emerge from an article "[SIMD considered harmful](https://www.sigarch.org/simd-instructions-considered-harmful/)" which illustrates a catastrophic rabbit-hole taken by Industry Giants