From: Luke Kenneth Casson Leighton Date: Fri, 30 Nov 2018 03:42:07 +0000 (+0000) Subject: add response to michael larabel article X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=f0d8543c97896eef0e5f687a083ddbbfd2525fa9;p=crowdsupply.git add response to michael larabel article --- diff --git a/updates/002_201830nov_phoronix_articles.mdwn b/updates/002_201830nov_phoronix_articles.mdwn new file mode 100644 index 0000000..0751c29 --- /dev/null +++ b/updates/002_201830nov_phoronix_articles.mdwn @@ -0,0 +1,154 @@ +Many thanks to Michael Larabel as he has been writing early articles +on this project before we had a chance to get this pre-launch stage +up and running, he picked up on the first update +[here](http://www.phoronix.com/scan.php?page=news_item&px=Quad-Core-Libre-SoC-Proposal). The first article covered a lot more of the +[technical details](https://www.phoronix.com/scan.php?page=news_item&px=Libre-GPU-RISC-V-Vulkan), and the second covered an announcement of +[Kazan](https://www.phoronix.com/scan.php?page=news_item&px=Kazan-Vulkan-Rust), +which implements the Vulkan 3D API. + +There has been quite a lot going on, including an enormous amount of +planning for nearly six to eight months going on, so there are quite +a few catch-up updates to write. It's worthwhile doing one that incorporates +responses to Michael and to some of the people who also kindly asked +questions and made comments on the +[Phoronix Forum](https://www.phoronix.com/forums/node/1064199). + +I have no illusions about the cost of development of this project: it's +going to be somewhere north of USD $6 million, with contingency of up +to USD $10 million. This is just how it is. What that means is, +interestingly, it means that there's provision for both investment and +also to attract really, really good talent, and to properly pay for it. +Where the project has started from is what can be achieved with the +current resources. I've been kindly sponsored with a ZC706 FPGA board +(worth over USD $2,500), which will allow one major hurdle to be cleared +that will meet the criteria of many investors: making sure that the +design is FPGA proven. + +Secondly, Michael, I note some incredulity at the goal of meeting the +target of mobile-class 3D performance. It's actually extremely modest: +100 million pixels per second, 30 million triangles per second, and around +5 to 6 GFLOPs. These statistics were taken from the benchmarks for +Vivante's GC800. Achieving these kinds of numbers is dead easy. Achieving +them within a power envelope of under 2.5 watts? Not so easy! + +So here, what I did was, spend a considerable amount of time speaking to +Jeff Bush, who developed +[Nyuzi](https://www.phoronix.com/scan.php?page=news_item&px=LGPL-GPGPU-NyuziProcessor). +Jeff's work is fascinating and extremely valuable because despite it being +such low 3D peformance, the technical documentation and academic analysis +of *why* that performance is so low is absolutely, absolutely critical. +The [paper](http://www.cs.binghamton.edu/~millerti/nyuziraster.pdf) that +Jeff co-authored makes a comparison of software and hardware rasterisation, +and he actually developed a fixed-function hardware renderer called +[ChiselGPU](https://github.com/jbush001/ChiselGPU) in order to do +comparisons. + +One fascinating insight that came out of Jeff's work was that just getting +data through the L1/L2 cache has a massive impact on power consumption. +A way to deal with that is to increase the number of registers in the +design until such time as the data being processed (a tile for example, +or an inter-dependent 4-wide bank of 4x3 Floating-point numbers) can +all fit into the register file, so as not to need to be pushed back down +to the L1 cache and back. Some GPUs have a "scratch RAM" area to deal +with this. Staggeringly, even for (or, especially for) a mobile-class +GPU, we had to increase the register file size to a whopping 128 64-bit +entries, that can be broken down into **256** 32-bit single-precision +floating-point entries! What are we *doing*! This is supposed to be +a modest design! + +A little digging around the Internet reveals that even mobile-class GPUs +genuinely have this number of registers. More than that, though, it +turns out that we may have a hidden advantage through implementing +Kazan as a Vulkan Driver. In this +[discussion](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2018-October/000065.html) +which follows on from a proposal on the LLVM mailing list about making +Matrices a first-class type, Jacob informed me that the Vulkan API also +passes in large batches of data that contains Matrices, and also arrays of +data structures that need to be processed together. The problem here +is that you want the elements of the arrays (or the Matrix) to be +processed as if they were linear, preferably without having to move +them around. Matrix multiplication for example typically requires the +2nd matrix to be transposed (X swapped with Y) in order to access the +elements in a linear fashion. What we decided to do instead was to +add [1D/2D/3D data shaping](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2018-October/000087.html). The elements *stay in-place* in the +original registers, however a "remapper" engine makes them appear, +as far as the parallel (SIMD/Vector) engine is concerned, as if the +registers are contiguous. Register numbers 0 1 2 3 / 4 5 6 7 / 8 9 10 11 +get "remapped" to 0 4 8 / 1 5 9 / 2 6 10 / 3 7 11 without the need +for "MV" instructions. Thanks to Mitch Alsup, the designer of the +66000 ISA, we learned that this re-invented wheel has +[also been implemented](https://groups.google.com/d/msg/comp.arch/bGBeaNjAKvc/_vbqyxTUAQAJ) +in production GPUs and Vectorisation Systems. + +The point is that by picking Vulkan, and implementing both the hardware +design and the software at the same time, we are both constrained +*and guided* towards a successful design. In addition, as we know from +previous successful (truly) open projects, the very fact that you are +at liberty to talk about what you're doing (as compared to a secretive +proprietary company) means that people with specialist expertise are +more than happy to come forward and comment, and help guide you away +from areas that have caught out billion-dollar companies in the past. + +The project thus becomes a *synthesis of the expertise and efforts +of much more than just the people who are implementing it*. + +Just having the opportunity to do that is extremely humbling. Mitch Alsup, +the designer of the famous Motorola 68000 series of processors, +is giving us some feedback and input! Like... wow! For example, +he made an extremely valuable recommendation +[here](https://libre-riscv.org/3d_gpu/microarchitecture/) on how +to save on register file space, only needing 1R1W (1x read-port, +1x write-port) SRAM, by stretching out the pipeline phases to +load operands sequentially rather than in parallel. I cannot express +how grateful I am for his input, and for all the other people who +have helped. + +So I believe we're in good hands, here. Ultimately, what is being +presented here is more of an opportunity for anyone who has wanted +something like this to succeed (or even exist), empowering them to +go from "I wish someone would do this" to "I can help make it happen". +This is one of the reasons why, if i am honest, I get slightly aggravated +by people who write, "oh this project could not possibly succeed" +or "this person could not possibly achieve this goal" as such comments +entirely miss the point. + +As this update is quite long, I'll answer more on the Phoronix +[comments](https://www.phoronix.com/forums/forum/phoronix/latest-phoronix-articles/1064199-the-eoma68-libre-computer-developer-wants-to-tackle-a-quad-core-risc-v-libre-soc-design) +however I particularly wanted to address one comment, here: +*"This religious war target seems to be way off based on skill-set."*. +As you can see above, I believe that one aspect of that comment has +been addressed, above: as it's a libre project, unlike a proprietary +company we're at liberty to get out on the Internet and ask peoples' +advice before committing even to a design. + +The second aspect is the very silly "Religious War" implication. It's +absolutely nothing of the kind. What many people do not know about me +is that I pick projects that nobody else is doing, very very deliberately. +I pick projects that make an ethical difference. That, if successful, +many peoples' lives would be much better, less painful. It has absolutely +sod-all to do with "Religious frothing fervour" (foam, foam). + +Many companies choose to make ethical compromises in order to make a profit. +People are finally beginning to wake up to the consequences of this kind +of concentration of financial and informational power. In India, +people have been *murdered* based on Whatsapp viral hear-say. In the USA, +democratic elections have been interfered with (Cambridge Analytica). +I could very easily go to any of these massively-unethical Corporations, +and make an absolute fortune in the process of empowering them to do a +hundred more Cambridge Analyticas. *I choose not to do so*. + +So there are plenty of companies that make decisions without +a moral compass, because, financially, it is easier to do so. And, more +poignantly, it is legally permissible and *actively encouraged* by +legal frameworks, tax incentives and Government-sanction monopolies +known by the name "patents". + +What I am doing here is to demonstrate that none of that is necessary. +That it is possible to design - and get funding for - a desirable +product that happens also to be ethical. This is why the goal is +as it is: a mobile-class processor, because that's the kind of product +that could sell in large volumes at around the USD $4 mark. +You won't see any Corporation taking on such a goal, as they're required +to prioritise profits over ethics. So it's down to you, +if you want this project to succeed, to help make it happen. +