update ideas

[crowdsupply.git] / updates / 020_2019aug28_intriguing_ideas.mdwn
diff --git a/updates/020_2019aug28_intriguing_ideas.mdwn b/updates/020_2019aug28_intriguing_ideas.mdwn

index 589bf09a6da15ba711111b458a78c8a3512855be..fb2b9339c0ecf034e9426811b9b75614ea697577 100644 (file)
--- a/updates/020_2019aug28_intriguing_ideas.mdwn
+++ b/updates/020_2019aug28_intriguing_ideas.mdwn
@@ -1,10 +1,13 @@
-Intriguing Ideas
+# Intriguing Ideas
  
  Pixilica starts a 3D Open Graphics Alliance initiative; 
  We decide to go with a "reconfigurable" pipeline;
+Seven additional EUR 50,000 NLNet Grant proposals submitted.
  
  # The possibility of a 3D Open Graphics Alliance
  
+{https://youtu.be/HeVz-z4D8os}
+
  At SIGGRAPH 2019 this year there was a very interesting BoF, where the
  [idea was put forward]
  (https://www.pixilica.com/forum/event/risc-v-graphical-isa-at-siggraph-2019/p-1/dl-5d62b6282dc27100170a4a05)
@@ -17,9 +20,9 @@ attention at the BoF.
  The current 3D GPU designs -  NVIDIA, AMD, Intel, are hugely optimised
  for mass volume appeal. Niche markets, by virtue of the profit
  opportunities being lower or even negative given the design choices of
-the incumbents, are inherently penalised. Not only that but the source
-code of the 3D engines is proprietary, meaning that anything outside of
-what is dictated by the incumbents is out of the question.
+the incumbents, are inherently penalised.  Not only that: whilst things are
+slowly changing due to ongoing multi-man-year reverse-engineering efforts,
+3D driver source code is often proprietary as well.
  
  At the BoF, one attendee described how they are implementing *transparent*
  shader algorithms. Most shader hardware provides triangle algorithms that
@@ -36,29 +39,36 @@ range of architectures and requirements: all the way from small embedded
  softcores, to embedded GPUs for use in mobile processors, to HPC servers
  to high end Machine Learning and Robotics applications.
  
-One interesting thing that has to be made clear - the lesson from Nyuzi
-and Larrabee - is that a good Vector Processor does **not** automatically
-make a good 3D GPU. Jeff Bush designed Nyuzi very specifically to
-replicate the Larrabee team's work.  By deliberately not including custom
-3D Hardware Accelerated Opcodes, Nyuzi has only 25% the performance of a modern
-GPU consuming the same amount of power.  Put another way: if you want to use
-a pure Vector Engine to get the same performance as a commercially-competitive
-GPU, you need *four times* the power consumption and four times the silicon
-area.
-
-Thus we simply cannot use the upcoming RISC-V Vector Extension, or even
-SimpleV, and expect to automatically have a commercially competitive
-3D GPU. It takes texture opcodes, Z-Buffers, pixel conversion, Linear
-Interpolation, Trascendentals (sin, cos, exp, log), and much more, all
-of which has to be designed, thought through, implemented *and then used
-behind a suitable API*.
+One interesting thing that has to be made clear - the lesson from
+Nyuzi and Larrabee - is that a good Vector Processor does **not**
+automatically make a good 3D GPU. Jeff Bush designed Nyuzi very
+specifically to replicate the Larrabee team's work: in particular, their
+use of a recursive software-based tiling algorithm.  By deliberately
+not including custom 3D Hardware Accelerated Opcodes, Nyuzi has only
+25% the performance of a modern GPU consuming the same amount of power.
+Put another way: if you want to use a pure Vector Engine to get the same
+performance as a commercially-competitive GPU, you need *four times*
+the power consumption and four times the silicon area.
+
+Thus we simply cannot use an off-the-shelf Vector extension such as the
+upcoming RISC-V Vector Extension, or even SimpleV, and expect to
+automatically have a commercially competitive 3D GPU. It takes texture
+opcodes, Z-Buffers, pixel conversion, Linear Interpolation, Trascendentals
+(sin, cos, exp, log), and much more, all of which has to be designed,
+thought through, implemented *and then used behind a suitable API*.
  
  In addition, given that the Alliance is to meet the needs of "unusual"
  markets, it is no good creating an ISA that has such a high barrier to
  entry and such a power-performance penalty that it inherently excludes 
  the very implementors it is targetted at, particularly in Embedded markets.
  
-https://youtu.be/HeVz-z4D8os
+Thus we need a Hybrid Architecture, not just to reduce complexity, not
+just to meet Libre criteria, but to meet the long tail of innovation in
+3D and kick start some real innovation.
+These were the challenges discussed at the upcoming first
+[meetup](https://www.meetup.com/Bay-Area-RISC-V-Meetup/events/264231095/)
+at Western Digital's Milpitas HQ. Experts in 3D at the Meetup were really
+enthusiastic and praised this approach.
  
  # Reconfigureable Pipelines
  
@@ -88,8 +98,9 @@ It turns out that by using what is termed "transparent latches" that it
  is possible to do precisely that.  The advantages are enormous and were
  described in detail on comp.arch
  
-https://groups.google.com/d/msg/comp.arch/fcq-GLQqvas/SY2F9Hd8AQAJ  
-Earlier in that thread, someone kindly pointed out that IBM published
+Earlier in
+[this thread](https://groups.google.com/d/msg/comp.arch/fcq-GLQqvas/SY2F9Hd8AQAJ),
+someone kindly pointed out that IBM published
  papers on the technique.  Basically, the latches normally present in the
  pipeline have a combinatorial "bypass" in the form of a Mux. The output
  is dynamically selected from either the input *or* the input after it
@@ -107,3 +118,42 @@ impact on instruction latency.
  It's a fantastic idea that will allow us to reconfigure the processor
  to reach a 1.5ghz clock rate for high performance bursts.
  
+# NLNet Funding proposals.
+
+The next step is to put in half a dozen NLNet Funding proposals. No,
+literally:
+[seven new proposals](https://libre-riscv.org/nlnet_proposals/),
+each for EUR 50,000. One for gcc, one for a port of MESA RADV to the
+new processor, another for writing experimental assembly code to go into
+libswscale, libx264 etc. ultimately for use in VLC and ffmpeg and so on.
+
+Best of all, two for actually doing a test ASIC: one working with
+chips4makers, the other with lip6.fr. It turns out that 180nm ASIC shuttle
+services cost only USD 600 per square mm, and we can get away with around
+20 sq.mm which is about USD 12,000 and estimated 800,000 gates.
+
+At that low cost, we can iterate before going to lower geometries plus
+actually have something which, even at 350mhz, if it was dual issue,
+would be a reasonable saleable product in its own right.  The only thing
+we have to watch out for, there, is that it will be a bit of a monster
+so power consumption is going to be high at 350mhz. Still, for a first
+ASIC ever, it's just exciting to think that it's possible at all.
+
+Regarding the NLNet proposals: we need people! In particular, we need two
+EU Citizens to come forward, to satisfy NLNet's backers' requirements
+(Thanks to [NGU.eu](https://ngi.eu), NLNet has received its money under
+the EU Horizon 2020 Programme), so at least one EU Citizen has to be
+part of the proposal. One for gcc, another for the MESA/RADV port.
+Please do contact me for details. There's no contract or obligation,
+because this is charitable donations.
+
+In addition, if anyone wants to receive tax deductible charitable
+donations direct from NLNet for working on aspects of this project,
+do get in touch, there is plenty to do.  Application reviews start in 2
+weeks, we will hear from NLnet by December as to what has been approved,
+and will be able to expand the project scope around January 2020.
+
+Also remember, if you work for a Corporation that could financially
+benefit from this project being a reality, sponsorship, via NLNet,
+is tax deductible because it is a charitable donation.
+