update ideas

[crowdsupply.git] / updates / 020_2019aug28_intriguing_ideas.mdwn
diff --git a/updates/020_2019aug28_intriguing_ideas.mdwn b/updates/020_2019aug28_intriguing_ideas.mdwn

index 589bf09a6da15ba711111b458a78c8a3512855be..fb2b9339c0ecf034e9426811b9b75614ea697577 100644 (file)
--- a/updates/020_2019aug28_intriguing_ideas.mdwn
+++ b/updates/020_2019aug28_intriguing_ideas.mdwn
@@ -1,10 +1,13 @@
-Intriguing Ideas
+# Intriguing Ideas
  
  Pixilica starts a 3D Open Graphics Alliance initiative; 
  We decide to go with a "reconfigurable" pipeline;
  
  Pixilica starts a 3D Open Graphics Alliance initiative; 
  We decide to go with a "reconfigurable" pipeline;
+Seven additional EUR 50,000 NLNet Grant proposals submitted.
  
  # The possibility of a 3D Open Graphics Alliance
  
  
  # The possibility of a 3D Open Graphics Alliance
  
+{https://youtu.be/HeVz-z4D8os}
+
  At SIGGRAPH 2019 this year there was a very interesting BoF, where the
  [idea was put forward]
  (https://www.pixilica.com/forum/event/risc-v-graphical-isa-at-siggraph-2019/p-1/dl-5d62b6282dc27100170a4a05)
  At SIGGRAPH 2019 this year there was a very interesting BoF, where the
  [idea was put forward]
  (https://www.pixilica.com/forum/event/risc-v-graphical-isa-at-siggraph-2019/p-1/dl-5d62b6282dc27100170a4a05)
@@ -17,9 +20,9 @@ attention at the BoF.
  The current 3D GPU designs -  NVIDIA, AMD, Intel, are hugely optimised
  for mass volume appeal. Niche markets, by virtue of the profit
  opportunities being lower or even negative given the design choices of
  The current 3D GPU designs -  NVIDIA, AMD, Intel, are hugely optimised
  for mass volume appeal. Niche markets, by virtue of the profit
  opportunities being lower or even negative given the design choices of
-the incumbents, are inherently penalised. Not only that but the source
-code of the 3D engines is proprietary, meaning that anything outside of
-what is dictated by the incumbents is out of the question.
+the incumbents, are inherently penalised.  Not only that: whilst things are
+slowly changing due to ongoing multi-man-year reverse-engineering efforts,
+3D driver source code is often proprietary as well.
  
  At the BoF, one attendee described how they are implementing *transparent*
  shader algorithms. Most shader hardware provides triangle algorithms that
  
  At the BoF, one attendee described how they are implementing *transparent*
  shader algorithms. Most shader hardware provides triangle algorithms that
@@ -36,29 +39,36 @@ range of architectures and requirements: all the way from small embedded
  softcores, to embedded GPUs for use in mobile processors, to HPC servers
  to high end Machine Learning and Robotics applications.
  
  softcores, to embedded GPUs for use in mobile processors, to HPC servers
  to high end Machine Learning and Robotics applications.
  
-One interesting thing that has to be made clear - the lesson from Nyuzi
-and Larrabee - is that a good Vector Processor does **not** automatically
-make a good 3D GPU. Jeff Bush designed Nyuzi very specifically to
-replicate the Larrabee team's work.  By deliberately not including custom
-3D Hardware Accelerated Opcodes, Nyuzi has only 25% the performance of a modern
-GPU consuming the same amount of power.  Put another way: if you want to use
-a pure Vector Engine to get the same performance as a commercially-competitive
-GPU, you need *four times* the power consumption and four times the silicon
-area.
-
-Thus we simply cannot use the upcoming RISC-V Vector Extension, or even
-SimpleV, and expect to automatically have a commercially competitive
-3D GPU. It takes texture opcodes, Z-Buffers, pixel conversion, Linear
-Interpolation, Trascendentals (sin, cos, exp, log), and much more, all
-of which has to be designed, thought through, implemented *and then used
-behind a suitable API*.
+One interesting thing that has to be made clear - the lesson from
+Nyuzi and Larrabee - is that a good Vector Processor does **not**
+automatically make a good 3D GPU. Jeff Bush designed Nyuzi very
+specifically to replicate the Larrabee team's work: in particular, their
+use of a recursive software-based tiling algorithm.  By deliberately
+not including custom 3D Hardware Accelerated Opcodes, Nyuzi has only
+25% the performance of a modern GPU consuming the same amount of power.
+Put another way: if you want to use a pure Vector Engine to get the same
+performance as a commercially-competitive GPU, you need *four times*
+the power consumption and four times the silicon area.
+
+Thus we simply cannot use an off-the-shelf Vector extension such as the
+upcoming RISC-V Vector Extension, or even SimpleV, and expect to
+automatically have a commercially competitive 3D GPU. It takes texture
+opcodes, Z-Buffers, pixel conversion, Linear Interpolation, Trascendentals
+(sin, cos, exp, log), and much more, all of which has to be designed,
+thought through, implemented *and then used behind a suitable API*.
  
  In addition, given that the Alliance is to meet the needs of "unusual"
  markets, it is no good creating an ISA that has such a high barrier to
  entry and such a power-performance penalty that it inherently excludes 
  the very implementors it is targetted at, particularly in Embedded markets.
  
  
  In addition, given that the Alliance is to meet the needs of "unusual"
  markets, it is no good creating an ISA that has such a high barrier to
  entry and such a power-performance penalty that it inherently excludes 
  the very implementors it is targetted at, particularly in Embedded markets.
  
-https://youtu.be/HeVz-z4D8os
+Thus we need a Hybrid Architecture, not just to reduce complexity, not
+just to meet Libre criteria, but to meet the long tail of innovation in
+3D and kick start some real innovation.
+These were the challenges discussed at the upcoming first
+[meetup](https://www.meetup.com/Bay-Area-RISC-V-Meetup/events/264231095/)
+at Western Digital's Milpitas HQ. Experts in 3D at the Meetup were really
+enthusiastic and praised this approach.
  
  # Reconfigureable Pipelines
  
  
  # Reconfigureable Pipelines
  
@@ -88,8 +98,9 @@ It turns out that by using what is termed "transparent latches" that it
  is possible to do precisely that.  The advantages are enormous and were
  described in detail on comp.arch
  
  is possible to do precisely that.  The advantages are enormous and were
  described in detail on comp.arch
  
-https://groups.google.com/d/msg/comp.arch/fcq-GLQqvas/SY2F9Hd8AQAJ  
-Earlier in that thread, someone kindly pointed out that IBM published
+Earlier in
+[this thread](https://groups.google.com/d/msg/comp.arch/fcq-GLQqvas/SY2F9Hd8AQAJ),
+someone kindly pointed out that IBM published
  papers on the technique.  Basically, the latches normally present in the
  pipeline have a combinatorial "bypass" in the form of a Mux. The output
  is dynamically selected from either the input *or* the input after it
  papers on the technique.  Basically, the latches normally present in the
  pipeline have a combinatorial "bypass" in the form of a Mux. The output
  is dynamically selected from either the input *or* the input after it
@@ -107,3 +118,42 @@ impact on instruction latency.
  It's a fantastic idea that will allow us to reconfigure the processor
  to reach a 1.5ghz clock rate for high performance bursts.
  
  It's a fantastic idea that will allow us to reconfigure the processor
  to reach a 1.5ghz clock rate for high performance bursts.
  
+# NLNet Funding proposals.
+
+The next step is to put in half a dozen NLNet Funding proposals. No,
+literally:
+[seven new proposals](https://libre-riscv.org/nlnet_proposals/),
+each for EUR 50,000. One for gcc, one for a port of MESA RADV to the
+new processor, another for writing experimental assembly code to go into
+libswscale, libx264 etc. ultimately for use in VLC and ffmpeg and so on.
+
+Best of all, two for actually doing a test ASIC: one working with
+chips4makers, the other with lip6.fr. It turns out that 180nm ASIC shuttle
+services cost only USD 600 per square mm, and we can get away with around
+20 sq.mm which is about USD 12,000 and estimated 800,000 gates.
+
+At that low cost, we can iterate before going to lower geometries plus
+actually have something which, even at 350mhz, if it was dual issue,
+would be a reasonable saleable product in its own right.  The only thing
+we have to watch out for, there, is that it will be a bit of a monster
+so power consumption is going to be high at 350mhz. Still, for a first
+ASIC ever, it's just exciting to think that it's possible at all.
+
+Regarding the NLNet proposals: we need people! In particular, we need two
+EU Citizens to come forward, to satisfy NLNet's backers' requirements
+(Thanks to [NGU.eu](https://ngi.eu), NLNet has received its money under
+the EU Horizon 2020 Programme), so at least one EU Citizen has to be
+part of the proposal. One for gcc, another for the MESA/RADV port.
+Please do contact me for details. There's no contract or obligation,
+because this is charitable donations.
+
+In addition, if anyone wants to receive tax deductible charitable
+donations direct from NLNet for working on aspects of this project,
+do get in touch, there is plenty to do.  Application reviews start in 2
+weeks, we will hear from NLnet by December as to what has been approved,
+and will be able to expand the project scope around January 2020.
+
+Also remember, if you work for a Corporation that could financially
+benefit from this project being a reality, sponsorship, via NLNet,
+is tax deductible because it is a charitable donation.
+