From 222dbad5ff73ded09fba7ab36d206d62bc44e6be Mon Sep 17 00:00:00 2001
From: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date: Wed, 27 Jun 2018 10:56:44 +0100
Subject: [PATCH] add libre 3d gpu page

---
 shakti/m_class/libre_3d_gpu.mwdn | 109 +++++++++++++++++++++++++++++++
 1 file changed, 109 insertions(+)
 create mode 100644 shakti/m_class/libre_3d_gpu.mwdn

diff --git a/shakti/m_class/libre_3d_gpu.mwdn b/shakti/m_class/libre_3d_gpu.mwdn
new file mode 100644
index 000000000..fc070176c
--- /dev/null
+++ b/shakti/m_class/libre_3d_gpu.mwdn
@@ -0,0 +1,109 @@
+# Requirements
+
+## GPU size and power
+
+> 1.1. GPU size MUST be < 0.XX mm for ASICs after synthesis with
+> DesignCompiler tool using YY cell library at ZZ nm tech.
+
+basically the power requirement should be at or below around 1 watt
+in 40nm.  beyond 1 watt it becomes... difficult.   size is not
+particularly critical as such but should not be insane.
+
+so here's a table showing embedded cores:
+<https://www.cnx-software.com/2013/01/19/gpus-comparison-arm-mali-vs-vivante-gcxxx-vs-powervr-sgx-vs-nvidia-geforce-ulp/>
+
+GC800 has (in 40nm):
+
+* 35 million triangles/sec
+* 325 milllion pixels/sec
+* 6 GFLOPS
+* 1.9mm^2 synthesis area
+* 2.5mm^2 silicon area.
+
+silicon area corresponds *ROUGHLY* with power usage, but PLEASE do
+not take that as absolute, because if you read jeff's nyuzi 2016 paper
+you'll see that getting data through the L1/L2 cache barrier is by far
+and above the biggest eater of power.
+
+note lower down that the numbers for MALI400 are for the *4* core
+version - MALI400-MP4 - where jeff and i compared MALI400 SINGLE CORE
+and discovered that nyuzi, if 4 parallel nyuzi cores were put
+together, would reach only 25% of MALI400's performance (in about the
+same silicon area)
+
+## Other
+
+
+* Deadline = 12-18 months
+* The GPU is matched by the Gallium3D driver
+* RTL must be sufficient to run on an FPGA.
+* Software must be licensed under LGPLv2+ or BSD/MIT.
+* Hardware (RTL) must be licensed under BSD or MIT with no
+  "NON-COMMERCIAL" CLAUSES.
+* Any proposals will be competing against Vivante GC800 (using Etnaviv driver).
+* The GPU is integrated (like Mali400). So all that the GPU needs to do
+  is write to an area of memory (framebuffer or area of the framebuffer).
+  the SoC - which in this case has a RISC-V core and has peripherals such
+  as the LCD controller - will take care of the rest.
+* In this arcitecture, the GPU, the CPU and the peripherals are all on
+  the same AXI4 shared memory bus. They all have access to the same shared
+  DDR3/DDR4 RAM. So as a result the GPU will use AXI4 to write directly
+  to the framebuffer and the rest will be handle by SoC.
+* The job must be done by a team that shows sufficient expertise to
+  reduce the risk. (Do you mean a team with good CVs? What about if the
+  team shows you an acceptable FPGA prototype? Iâm talking about a team
+  of students which do not have big industrial CVs but they know how to
+  handle this job (just like RocketChip or MIAOW or etcâ¦).
+
+response:
+
+> Deadline = ?
+
+about 12-18 months which is really tight.  if an FPGA (or simulation)
+plus the basics of the software driver are at least prototyped by then
+it *might* be ok.
+
+if using nyuzi as the basis it *might* be possible to begin the
+software port in parallel because jeff went to the trouble of writing
+a cycle-accurate simulation.
+
+
+> The GPU must be matched by the Gallium3D driver
+
+that's the *recommended* approach, as i *suspect* it will result in less
+work than, for example, writing an entire OpenGL stack from scratch.
+
+
+> RTL must be sufficient to run on an FPGA.
+
+a *demo* must run on an FPGA as an initial
+
+> Software must be licensed under LGPLv2+ or BSD/MIT.
+
+and no other licenses.  GPLv2+ is out.
+
+> Hardware (RTL) must be licensed under BSD or MIT with no âNON-COMMERCIAL
+> CLAUSESâ.
+> Any proposals will be competing against Vivante GC800 (using Etnaviv
+> driver).
+
+in terms of price, performance and power budget, yes.  if you look up
+the numbers (triangles/sec, pixels/sec, power usage, die area) you'll
+find it's really quite modest.  nyuzi right now requires FOUR times the
+silicon area of e.g. MALI400 to achieve the same performance as MALI400,
+meaning that the power usage alone would be well in excess of the budget.
+
+> The job must be done by a team that shows sufficient expertise to reduce the
+> risk. (Do you mean a team with good CVs? What about if the team shows you an
+> acceptable FPGA prototype?
+
+that would be fantastic as it would demonstrate not only competence
+but also committment.  and will have taken out the "risk" of being
+"unknown", entirely.
+
+> Iâm talking about a team of students which do not
+> have big industrial CVs but they know how to handle this job (just like
+> RocketChip or MIAOW or etcâ¦).
+
+ works perfectly for me :)
+
-- 
2.30.2