From: Joshua Harlan Lifton Date: Tue, 4 Dec 2018 20:07:58 +0000 (-0800) Subject: Copy edit second update X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=43c16fe9797e5217b9d0005ab59d7ed002b9ea6e;p=crowdsupply.git Copy edit second update --- diff --git a/updates/002_201830nov_phoronix_articles.mdwn b/updates/002_201830nov_phoronix_articles.mdwn index 0751c29..41d97b6 100644 --- a/updates/002_201830nov_phoronix_articles.mdwn +++ b/updates/002_201830nov_phoronix_articles.mdwn @@ -1,89 +1,105 @@ -Many thanks to Michael Larabel as he has been writing early articles -on this project before we had a chance to get this pre-launch stage -up and running, he picked up on the first update -[here](http://www.phoronix.com/scan.php?page=news_item&px=Quad-Core-Libre-SoC-Proposal). The first article covered a lot more of the -[technical details](https://www.phoronix.com/scan.php?page=news_item&px=Libre-GPU-RISC-V-Vulkan), and the second covered an announcement of -[Kazan](https://www.phoronix.com/scan.php?page=news_item&px=Kazan-Vulkan-Rust), -which implements the Vulkan 3D API. +Many thanks to Michael Larabel, who has been writing early articles on +this project even before we had a chance to set up this pre-launch +page. What follows are some of my observations and responses about the +articles. + +#### List of Phoronix Articles + +For the last few months, Michael has been covering various aspects of +this project. The first article covered a lot of the technical +details, the second article covered an announcement of Kazan, which +implements the Vulkan 3D API, and the most recent article picks up on +[the first project +update](why-make-a-quad-core-64-bit-soc-surely-there-are-enough-already): + +- [There's A New Libre GPU Effort Building On RISC-V, Rust, LLVM & Vulkan](https://www.phoronix.com/scan.php?page=news_item&px=Libre-GPU-RISC-V-Vulkan) (posted 28 September 2018) +- [The Kazan Vulkan CPU/Software-Based Implementation Being Rewritten In Rust](https://www.phoronix.com/scan.php?page=news_item&px=Kazan-Vulkan-Rust) (posted 04 October 2018) +- [The EOMA68 Libre Computer Developer Wants To Tackle A Quad-Core RISC-V Libre SoC Design](http://www.phoronix.com/scan.php?page=news_item&px=Quad-Core-Libre-SoC-Proposal) (posted 29 November 2018) There has been quite a lot going on, including an enormous amount of -planning for nearly six to eight months going on, so there are quite +planning for nearly six to eight months, so there are quite a few catch-up updates to write. It's worthwhile doing one that incorporates responses to Michael and to some of the people who also kindly asked questions and made comments on the [Phoronix Forum](https://www.phoronix.com/forums/node/1064199). -I have no illusions about the cost of development of this project: it's -going to be somewhere north of USD $6 million, with contingency of up -to USD $10 million. This is just how it is. What that means is, -interestingly, it means that there's provision for both investment and -also to attract really, really good talent, and to properly pay for it. -Where the project has started from is what can be achieved with the -current resources. I've been kindly sponsored with a ZC706 FPGA board -(worth over USD $2,500), which will allow one major hurdle to be cleared -that will meet the criteria of many investors: making sure that the -design is FPGA proven. - -Secondly, Michael, I note some incredulity at the goal of meeting the -target of mobile-class 3D performance. It's actually extremely modest: -100 million pixels per second, 30 million triangles per second, and around +#### Comments and Responses + +I have no illusions about the cost of development of this project: +it's going to be somewhere north of USD $6 million, with contingency +of up to USD $10 million. This is just how it is. Interestingly, +that means there's provision for both attracting investment and +really, really good talent, and to properly pay for that talent. The +project's origins reflect what can be achieved with the current +resources. I've been kindly sponsored with a ZC706 FPGA board (worth +over USD $2,500), which will allow one major hurdle to be cleared that +will meet the criteria of many investors: making sure the design is +FPGA proven. + +Secondly, I note Michael is a bit incredulous of the goal of achieving +mobile-class 3D performance. It's actually extremely modest: 100 +million pixels per second, 30 million triangles per second, and around 5 to 6 GFLOPs. These statistics were taken from the benchmarks for -Vivante's GC800. Achieving these kinds of numbers is dead easy. Achieving -them within a power envelope of under 2.5 watts? Not so easy! +Vivante's GC800. Achieving these kind of numbers is dead easy. +Achieving them within a power envelope of under 2.5 watts? Not so +easy! -So here, what I did was, spend a considerable amount of time speaking to -Jeff Bush, who developed +To that end, I spent a considerable amount of time speaking to Jeff +Bush, who developed [Nyuzi](https://www.phoronix.com/scan.php?page=news_item&px=LGPL-GPGPU-NyuziProcessor). -Jeff's work is fascinating and extremely valuable because despite it being -such low 3D peformance, the technical documentation and academic analysis -of *why* that performance is so low is absolutely, absolutely critical. -The [paper](http://www.cs.binghamton.edu/~millerti/nyuziraster.pdf) that -Jeff co-authored makes a comparison of software and hardware rasterisation, -and he actually developed a fixed-function hardware renderer called -[ChiselGPU](https://github.com/jbush001/ChiselGPU) in order to do -comparisons. +Jeff's work is fascinating and extremely valuable because, despite it +being such low 3D peformance, the technical documentation and academic +analysis of *why* that performance is so low is absolutely, absolutely +critical. The +[paper](http://www.cs.binghamton.edu/~millerti/nyuziraster.pdf) that +Jeff co-authored makes a comparison of software and hardware +rasterisation, and he actually developed a fixed-function hardware +renderer called [ChiselGPU](https://github.com/jbush001/ChiselGPU) in +order to do the comparisons. One fascinating insight that came out of Jeff's work was that just getting data through the L1/L2 cache has a massive impact on power consumption. A way to deal with that is to increase the number of registers in the -design until such time as the data being processed (a tile for example, -or an inter-dependent 4-wide bank of 4x3 Floating-point numbers) can +design until such time as the data being processed (for example, a tile, +or an inter-dependent four-wide bank of 4x3 floating-point numbers) can all fit into the register file, so as not to need to be pushed back down to the L1 cache and back. Some GPUs have a "scratch RAM" area to deal with this. Staggeringly, even for (or, especially for) a mobile-class GPU, we had to increase the register file size to a whopping 128 64-bit entries, that can be broken down into **256** 32-bit single-precision -floating-point entries! What are we *doing*! This is supposed to be +floating-point entries! What are we *doing*?! This is supposed to be a modest design! -A little digging around the Internet reveals that even mobile-class GPUs -genuinely have this number of registers. More than that, though, it -turns out that we may have a hidden advantage through implementing -Kazan as a Vulkan Driver. In this -[discussion](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2018-October/000065.html) +A little digging around the Internet reveals that even mobile-class +GPUs genuinely have this number of registers. More than that, though, +it turns out that we may have a hidden advantage through implementing +Kazan as a Vulkan Driver. In [this mailing list +discussion](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2018-October/000065.html) which follows on from a proposal on the LLVM mailing list about making -Matrices a first-class type, Jacob informed me that the Vulkan API also -passes in large batches of data that contains Matrices, and also arrays of -data structures that need to be processed together. The problem here -is that you want the elements of the arrays (or the Matrix) to be -processed as if they were linear, preferably without having to move -them around. Matrix multiplication for example typically requires the -2nd matrix to be transposed (X swapped with Y) in order to access the -elements in a linear fashion. What we decided to do instead was to -add [1D/2D/3D data shaping](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2018-October/000087.html). The elements *stay in-place* in the -original registers, however a "remapper" engine makes them appear, -as far as the parallel (SIMD/Vector) engine is concerned, as if the -registers are contiguous. Register numbers 0 1 2 3 / 4 5 6 7 / 8 9 10 11 -get "remapped" to 0 4 8 / 1 5 9 / 2 6 10 / 3 7 11 without the need -for "MV" instructions. Thanks to Mitch Alsup, the designer of the -66000 ISA, we learned that this re-invented wheel has -[also been implemented](https://groups.google.com/d/msg/comp.arch/bGBeaNjAKvc/_vbqyxTUAQAJ) -in production GPUs and Vectorisation Systems. - -The point is that by picking Vulkan, and implementing both the hardware +matrices a first-class type, Jacob informed me that the Vulkan API +also passes in large batches of data that contain matrices, and also +arrays of data structures that need to be processed together. The +problem here is that you want the elements of the arrays (or the +matrix) to be processed as if they were linear, preferably without +having to move them around. Matrix multiplication, for example, +typically requires the second matrix to be transposed (X swapped with +Y) in order to access the elements in a linear fashion. What we +decided to do instead was to add [1D/2D/3D data +shaping](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2018-October/000087.html). +The elements *stay in-place* in the original registers, however a +"remapper" engine makes them appear, as far as the parallel +(SIMD/Vector) engine is concerned, as if the registers are contiguous. +In the case of transposing, register numbers 0 1 2 3 / 4 5 6 7 / 8 9 +10 11 get "remapped" to 0 4 8 / 1 5 9 / 2 6 10 / 3 7 11 without the +need for "MV" instructions. Thanks to Mitch Alsup, the designer of +the 66000 ISA, we learned that this re-invented wheel has [also been +implemented](https://groups.google.com/d/msg/comp.arch/bGBeaNjAKvc/_vbqyxTUAQAJ) +in production GPUs and vectorisation systems. + +The point is, that by picking Vulkan and implementing both the hardware design and the software at the same time, we are both constrained *and guided* towards a successful design. In addition, as we know from -previous successful (truly) open projects, the very fact that you are +previous successful (truly) open projects, the very fact you are at liberty to talk about what you're doing (as compared to a secretive proprietary company) means that people with specialist expertise are more than happy to come forward and comment, and help guide you away @@ -92,63 +108,62 @@ from areas that have caught out billion-dollar companies in the past. The project thus becomes a *synthesis of the expertise and efforts of much more than just the people who are implementing it*. -Just having the opportunity to do that is extremely humbling. Mitch Alsup, -the designer of the famous Motorola 68000 series of processors, -is giving us some feedback and input! Like... wow! For example, -he made an extremely valuable recommendation -[here](https://libre-riscv.org/3d_gpu/microarchitecture/) on how -to save on register file space, only needing 1R1W (1x read-port, -1x write-port) SRAM, by stretching out the pipeline phases to -load operands sequentially rather than in parallel. I cannot express -how grateful I am for his input, and for all the other people who -have helped. - -So I believe we're in good hands, here. Ultimately, what is being -presented here is more of an opportunity for anyone who has wanted -something like this to succeed (or even exist), empowering them to -go from "I wish someone would do this" to "I can help make it happen". -This is one of the reasons why, if i am honest, I get slightly aggravated -by people who write, "oh this project could not possibly succeed" -or "this person could not possibly achieve this goal" as such comments -entirely miss the point. - -As this update is quite long, I'll answer more on the Phoronix -[comments](https://www.phoronix.com/forums/forum/phoronix/latest-phoronix-articles/1064199-the-eoma68-libre-computer-developer-wants-to-tackle-a-quad-core-risc-v-libre-soc-design) -however I particularly wanted to address one comment, here: -*"This religious war target seems to be way off based on skill-set."*. +Just having the opportunity to do that is extremely humbling. Mitch +Alsup, the designer of the famous Motorola 68000 series of processors, +is giving us some feedback and input! Like... wow! For example, [he +made an extremely valuable +recommendation](https://libre-riscv.org/3d_gpu/microarchitecture/) on +how to save on register file space, only needing 1R1W (1 x read-port, +1 x write-port) SRAM, by stretching out the pipeline phases to load +operands sequentially rather than in parallel. I cannot express how +grateful I am for his input, and for all the other people who have +helped. + +So, given this community, I believe we're in good hands. Ultimately, +what is being presented here is more of an opportunity for anyone who +has wanted something like this to succeed (or even exist), empowering +them to go from "I wish someone would do this" to "I can help make it +happen." This is one of the reasons why, if I am honest, I get +slightly aggravated by people who write, "oh this project could not +possibly succeed" or "this person could not possibly achieve this +goal," as such comments entirely miss the point. + +As this update is quite long, I'll answer more on the [Phoronix +comments](https://www.phoronix.com/forums/forum/phoronix/latest-phoronix-articles/1064199-the-eoma68-libre-computer-developer-wants-to-tackle-a-quad-core-risc-v-libre-soc-design), +however, I particularly wanted to address one comment, here: +*"This religious war target seems to be way off based on skill set."*. As you can see above, I believe that one aspect of that comment has been addressed, above: as it's a libre project, unlike a proprietary company we're at liberty to get out on the Internet and ask peoples' -advice before committing even to a design. +advice before even committing to a design. -The second aspect is the very silly "Religious War" implication. It's +The second aspect is the very silly "religious war" implication. It's absolutely nothing of the kind. What many people do not know about me -is that I pick projects that nobody else is doing, very very deliberately. +is that I very, very deliberately pick projects that nobody else is doing. I pick projects that make an ethical difference. That, if successful, many peoples' lives would be much better, less painful. It has absolutely -sod-all to do with "Religious frothing fervour" (foam, foam). +sod-all to do with "religious frothing fervour" (foam, foam). Many companies choose to make ethical compromises in order to make a profit. People are finally beginning to wake up to the consequences of this kind of concentration of financial and informational power. In India, people have been *murdered* based on Whatsapp viral hear-say. In the USA, -democratic elections have been interfered with (Cambridge Analytica). -I could very easily go to any of these massively-unethical Corporations, +democratic elections have been interfered with (e.g., Cambridge Analytica). +I could very easily go to any of these massively-unethical corporations, and make an absolute fortune in the process of empowering them to do a -hundred more Cambridge Analyticas. *I choose not to do so*. +hundred more Cambridge Analyticas. *I choose not to do so.* -So there are plenty of companies that make decisions without +There are plenty of companies that make decisions without a moral compass, because, financially, it is easier to do so. And, more poignantly, it is legally permissible and *actively encouraged* by -legal frameworks, tax incentives and Government-sanction monopolies -known by the name "patents". +legal frameworks, tax incentives, and Government-sanction monopolies +known by the name "patents." What I am doing here is to demonstrate that none of that is necessary. That it is possible to design - and get funding for - a desirable product that happens also to be ethical. This is why the goal is as it is: a mobile-class processor, because that's the kind of product that could sell in large volumes at around the USD $4 mark. -You won't see any Corporation taking on such a goal, as they're required -to prioritise profits over ethics. So it's down to you, +You won't see any corporation taking on such a goal, as they're required +to prioritise profits over ethics. So, it's down to you, if you want this project to succeed, to help make it happen. -