Many thanks to Michael Larabel, who has been writing early articles on this project even before we had a chance to set up this pre-launch page. What follows are some of my observations and responses about the articles. #### List of Phoronix Articles For the last few months, Michael has been covering various aspects of this project. The first article covered a lot of the technical details, the second article covered an announcement of Kazan, which implements the Vulkan 3D API, and the most recent article picks up on [the first project update](why-make-a-quad-core-64-bit-soc-surely-there-are-enough-already): - [There's A New Libre GPU Effort Building On RISC-V, Rust, LLVM & Vulkan](https://www.phoronix.com/scan.php?page=news_item&px=Libre-GPU-RISC-V-Vulkan) (posted 28 September 2018) - [The Kazan Vulkan CPU/Software-Based Implementation Being Rewritten In Rust](https://www.phoronix.com/scan.php?page=news_item&px=Kazan-Vulkan-Rust) (posted 04 October 2018) - [The EOMA68 Libre Computer Developer Wants To Tackle A Quad-Core RISC-V Libre SoC Design](http://www.phoronix.com/scan.php?page=news_item&px=Quad-Core-Libre-SoC-Proposal) (posted 29 November 2018) There has been quite a lot going on, including an enormous amount of planning for nearly six to eight months, so there are quite a few catch-up updates to write. It's worthwhile doing one that incorporates responses to Michael and to some of the people who also kindly asked questions and made comments on the [Phoronix Forum](https://www.phoronix.com/forums/node/1064199). #### Comments and Responses I have no illusions about the cost of development of this project: it's going to be somewhere north of USD $6 million, with contingency of up to USD $10 million. This is just how it is. Interestingly, that means there's provision for both attracting investment and really, really good talent, and to properly pay for that talent. The project's origins reflect what can be achieved with the current resources. I've been kindly sponsored with a ZC706 FPGA board (worth over USD $2,500), which will allow one major hurdle to be cleared that will meet the criteria of many investors: making sure the design is FPGA proven. Secondly, I note Michael is a bit incredulous of the goal of achieving mobile-class 3D performance. It's actually extremely modest: 100 million pixels per second, 30 million triangles per second, and around 5 to 6 GFLOPs. These statistics were taken from the benchmarks for Vivante's GC800. Achieving these kind of numbers is dead easy. Achieving them within a power envelope of under 2.5 watts? Not so easy! To that end, I spent a considerable amount of time speaking to Jeff Bush, who developed [Nyuzi](https://www.phoronix.com/scan.php?page=news_item&px=LGPL-GPGPU-NyuziProcessor). Jeff's work is fascinating and extremely valuable because, despite it being such low 3D peformance, the technical documentation and academic analysis of *why* that performance is so low is absolutely, absolutely critical. The [paper](http://www.cs.binghamton.edu/~millerti/nyuziraster.pdf) that Jeff co-authored makes a comparison of software and hardware rasterisation, and he actually developed a fixed-function hardware renderer called [ChiselGPU](https://github.com/jbush001/ChiselGPU) in order to do the comparisons. One fascinating insight that came out of Jeff's work was that just getting data through the L1/L2 cache has a massive impact on power consumption. A way to deal with that is to increase the number of registers in the design until such time as the data being processed (for example, a tile, or an inter-dependent four-wide bank of 4x3 floating-point numbers) can all fit into the register file, so as not to need to be pushed back down to the L1 cache and back. Some GPUs have a "scratch RAM" area to deal with this. Staggeringly, even for (or, especially for) a mobile-class GPU, we had to increase the register file size to a whopping 128 64-bit entries, that can be broken down into **256** 32-bit single-precision floating-point entries! What are we *doing*?! This is supposed to be a modest design! A little digging around the Internet reveals that even mobile-class GPUs genuinely have this number of registers. More than that, though, it turns out that we may have a hidden advantage through implementing Kazan as a Vulkan Driver. In [this mailing list discussion](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2018-October/000065.html) which follows on from a proposal on the LLVM mailing list about making matrices a first-class type, Jacob informed me that the Vulkan API also passes in large batches of data that contain matrices, and also arrays of data structures that need to be processed together. The problem here is that you want the elements of the arrays (or the matrix) to be processed as if they were linear, preferably without having to move them around. Matrix multiplication, for example, typically requires the second matrix to be transposed (X swapped with Y) in order to access the elements in a linear fashion. What we decided to do instead was to add [1D/2D/3D data shaping](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2018-October/000087.html). The elements *stay in-place* in the original registers, however a "remapper" engine makes them appear, as far as the parallel (SIMD/Vector) engine is concerned, as if the registers are contiguous. In the case of transposing, register numbers 0 1 2 3 / 4 5 6 7 / 8 9 10 11 get "remapped" to 0 4 8 / 1 5 9 / 2 6 10 / 3 7 11 without the need for "MV" instructions. Thanks to Mitch Alsup, the designer of the 66000 ISA, we learned that this re-invented wheel has [also been implemented](https://groups.google.com/d/msg/comp.arch/bGBeaNjAKvc/_vbqyxTUAQAJ) in production GPUs and vectorisation systems. The point is, that by picking Vulkan and implementing both the hardware design and the software at the same time, we are both constrained *and guided* towards a successful design. In addition, as we know from previous successful (truly) open projects, the very fact you are at liberty to talk about what you're doing (as compared to a secretive proprietary company) means that people with specialist expertise are more than happy to come forward and comment, and help guide you away from areas that have caught out billion-dollar companies in the past. The project thus becomes a *synthesis of the expertise and efforts of much more than just the people who are implementing it*. Just having the opportunity to do that is extremely humbling. Mitch Alsup, the designer of the famous Motorola 68000 series of processors, is giving us some feedback and input! Like... wow! For example, [he made an extremely valuable recommendation](https://libre-riscv.org/3d_gpu/microarchitecture/) on how to save on register file space, only needing 1R1W (1 x read-port, 1 x write-port) SRAM, by stretching out the pipeline phases to load operands sequentially rather than in parallel. I cannot express how grateful I am for his input, and for all the other people who have helped. So, given this community, I believe we're in good hands. Ultimately, what is being presented here is more of an opportunity for anyone who has wanted something like this to succeed (or even exist), empowering them to go from "I wish someone would do this" to "I can help make it happen." This is one of the reasons why, if I am honest, I get slightly aggravated by people who write, "oh this project could not possibly succeed" or "this person could not possibly achieve this goal," as such comments entirely miss the point. As this update is quite long, I'll answer more on the [Phoronix comments](https://www.phoronix.com/forums/forum/phoronix/latest-phoronix-articles/1064199-the-eoma68-libre-computer-developer-wants-to-tackle-a-quad-core-risc-v-libre-soc-design), however, I particularly wanted to address one comment, here: *"This religious war target seems to be way off based on skill set."*. As you can see above, I believe that one aspect of that comment has been addressed, above: as it's a libre project, unlike a proprietary company we're at liberty to get out on the Internet and ask peoples' advice before even committing to a design. The second aspect is the very silly "religious war" implication. It's absolutely nothing of the kind. What many people do not know about me is that I very, very deliberately pick projects that nobody else is doing. I pick projects that make an ethical difference. That, if successful, many peoples' lives would be much better, less painful. It has absolutely sod-all to do with "religious frothing fervour" (foam, foam). Many companies choose to make ethical compromises in order to make a profit. People are finally beginning to wake up to the consequences of this kind of concentration of financial and informational power. In India, people have been *murdered* based on Whatsapp viral hear-say. In the USA, democratic elections have been interfered with (e.g., Cambridge Analytica). I could very easily go to any of these massively-unethical corporations, and make an absolute fortune in the process of empowering them to do a hundred more Cambridge Analyticas. *I choose not to do so.* There are plenty of companies that make decisions without a moral compass, because, financially, it is easier to do so. And, more poignantly, it is legally permissible and *actively encouraged* by legal frameworks, tax incentives, and Government-sanctioned monopolies known by the name "patents." What I am doing here is to demonstrate that none of that is necessary. That it is possible to design - and get funding for - a desirable product that happens also to be ethical. This is why the goal is as it is: a mobile-class processor, because that's the kind of product that could sell in large volumes at around the USD $4 mark. You won't see any corporation taking on such a goal, as they're required to prioritise profits over ethics. So, it's down to you, if you want this project to succeed, to help make it happen.