X-Git-Url: https://git.libre-soc.org/?a=blobdiff_plain;f=3d_gpu.mdwn;h=4c1282bf43e27580e2c544281d8fca3ee4a174e1;hb=e4eb759cbc77cf32f78d01f13e7d9e9d1f9285e8;hp=5ac25349c67822d4b8b6bf0fcb4d98d937e02059;hpb=211258521e9694e5bbc3ad23420d80cc9f61672f;p=libreriscv.git diff --git a/3d_gpu.mdwn b/3d_gpu.mdwn index 5ac25349c..4c1282bf4 100644 --- a/3d_gpu.mdwn +++ b/3d_gpu.mdwn @@ -1,184 +1,154 @@ -# RISC-V 3D GPU - -at FOSDEM 2018 when Yunsup and the team announced the U540 there was -some discussion about this: it was one of the questions asked. one of -the possibilities raised there was that maddog was heading something: -i've looked for that effort, and have not been able to find it [jon is -getting quite old, now, bless him. he had to have an operation last -year. he's recovered well]. - -also at the Barcelona Conference i mentioned in the -very-very-very-rapid talk on the Libre RISC-V chip that i have been -tasked with, that if there is absolutely absolutely no other option, -it will use Vivante GC800 (and, obviously, use etnaviv). what *that* -means is that there's a definite budget of USD $250,000 available -which the (anonymous) sponsor is definitely willing to spend... so if -anyone can come up with an alternative that is entirely libre and -open, i can put that initiative to the sponsor for evaluation. - -basically i've been looking at this for several months, so have been -talking to various people (jeff bush from nyuzi [1] and chiselgpu [2], -frank from gplgpu [3], VRG for MIAOW [4]) to get a feel for what would -be involved. - -* miaow is just an OpenCL engine that is compatible with a subset of - AMD/ATI's OpenCL assembly code. it is NOT a GPU. they have - preliminary plans to *make* one... however the development process is - not open. we'll hear about it if and when it succeeds, probably as - part of a published research paper. - -* nyuzi is a *modern* "software shader / renderer" and is a - replication of the intel larrabee architecture. it explored the - concept of doing recursive software-driven rasterisation (as did - larrabee) where hardware rasterisation uses brute force and often - wastes time and power. jeff went to a lot of trouble to find out - *why* intel's researchers were um "not permitted" to actually put - performance numbers into their published papers. he found out why :) - one of the main facts that jeff's research reveals (and there are a - lot of them) is that most of the energy of a GPU is spent getting data - each way past the L2/L1 cache barrier, and secondly much of the time - (if doing software-only rendering) you have several instruction cycles - where in a hardware design you issue one and a separate pipeline takes - over (see videocore-iv below) - -* chiselgpu was an additional effort by jeff to create the absolute - minimum required tile-based "triangle renderer" in hardware, for - comparative purposes in the nyuzi raster engine research. synthesis - of such a block he pointed out to me would actually be *enormous*, - despite appearances from how little code there is in the chiselgpu - repository. in his paper he mentions that the majority of the time - when such hardware-renderers are deployed, the rest of the GPU is - really struggling to keep up feeding the hardware-rasteriser, so you - have to put in multiple threads, and that brings its own problems. - it's all in the paper, it's fascinating stuff. - -* gplgpu was done by one of the original developers of the "Number - Nine" GPU, and is based around a "fixed function" design and as such - is no longer considered suitable for use in the modern 3D developer - community (they hate having to code for it), and its performance would - be *really* hard to optimise and extend. however in speaking to jeff, - who analysed it quite comprehensively, he said that there were a large - number of features (4-tuple floating-point colour to 16/32-bit ARGB - fixed functions) that have retained a presence in modern designs, so - it's still useful for inspiration and analysis purposes. you can see - jeff's analysis here [7] - -* an extremely useful resource has been the videocore-iv project [8] - which has collected documentation and part-implemented compiler tools. - the architecture is quite interesting, it's a hybrid of a - Software-driven Vector architecture similar to Nyuzi plus - fixed-functions on separate pipelines such as that "take 4-tuple FP, - turn it into fixed-point ARGB and overlay it into the tile" - instruction. that's done as a *single* instruction to cover i think 4 - pixels, where Nyuzi requires an average of 4 cycles per pixel. the - other thing about videocore-iv is that there is a separate internal - "scratch" memory area of size 4x4 (x32-bit) which is the "tile" area, - and focussing on filling just that is one of the things that saves - power. jeff did a walkthrough, you can read it here [10] [11] - -so on this basis i have been investigating a couple of proposals for -RISC-V extensions: one is Simple-V [9] and the other is a *small* -general-purpose memory-scratch area extension, which would be -accessible only on the *other* side of the L1/L2 cache area and *ONLY* -accessible by an individual core [or its hyperthreads]. small would -be essential because if a context-switch occurs it would be necessary -to swap the scratch-area out to main memory (and back). -general-purpose so that it's useful and useable in other contexts and -situations. - -whilst there are many additional reasons - justifications that make -it attractive for *general-purpose* usage (such as accidentally -providing LD.MULTI and ST.MULTI for context-switching and efficient -function call parameter stack storing, and an accidental -single-instruction "memcpy" and "memzero") - the primary driver behind -Simple-V has been as the basis for turning RISC-V into an -embedded-style (low-power) GPU (and also a VPU). - -one of the things that's lacking from RVV is parallelisation of -Bit-Manipulation. RVV has been primarily designed based on input from -the Supercomputer community, and as such it's *incredible*. -absolutely amazing... but only desirable to implementt if you need to -build a Supercomputer. - -Simple-V i therefore designed to parallelise *everything*. custom -extensions, future extensions, current extensions, current -instructions, *everything*. RVV, once it's been implemented in gcc -for example, would require heavy-customisation to support e.g. -Bit-Manipulation, would require special Bit-Manipulation Vector -instructions to be added *to RVV*... all of which would need to AGAIN -go through the Extension Proposal process... you can imagine how that -would go, and the subsequent cost of maintenance of gcc, binutils and -so on as a long-term preliminary (or if the extension to RVV is not -accepted, after all the hard work) even a permanent hard-fork. - -in other words once you've been through the "Extension Proposal -Process" with Simple-V, it need never be done again, not for one -single parallel / vector / SIMD instruction, ever again. - -that would include for example creating a fixed-function 3D "FP to -ARGB" custom instruction. a custom extension with special 3D -pipelines would, with Simple-V, not need to also have to worry about -how those operations would be parallelised. - -this is not a new concept: it's borrowed directly from videocore-iv -(which in turn probably borrowed it from somewhere else). -videocore-iv call it "virtual parallelism". the Vector Unit -*actually* has a 4-wide FPU for certain heavily-used operations such -as ADD, and a ***ONE*** wide FPU for less-used operations such as -RECIPSQRT. - -however at the *instruction* level each of those operations, -regardless of whether they're heavily-used or less-used they *appear* -to be 16 parallel operations all at once, as far as the compiler and -assembly writers are concerned. Simple-V just borrows this exact same -concept and lets implementors decide where to deploy it, to best -advantage. - - -> 2. If it’s a good idea to implement, are there any projects currently -> working on it? - -i haven't been able to find any: if you do please do let me know, i -would like to speak to them and find out how much time and money they -would need to complete the work. - -> If the answer is yes, would you mind mention the project’s name and -> website? -> -> If the answer is no, are there any special reasons that nobody not -> implement it yet? - -it's damn hard, it requires a *lot* of resources, and if the idea is -to make it entirely libre-licensed and royalty-free there is an extra -step required which a proprietary GPU company would not normally do, -and that is to follow the example of the BBC when they created their -own Video CODEC called Dirac [5]. - -what the BBC did there was create the algorithm *exclusively* from -prior art and expired patents... they applied for their own patents... -and then *DELIBERATELY* let them lapse. the way that the patent -system works, the patents will *still be published*, there will be an -official priority filing date in the patent records with the full text -and details of the patents. - -this strategy, where you MUST actually pay for the first filing -otherwise the records are REMOVED and never published, acts as a way -of preventing and prohibiting unscrupulous people from grabbing the -whitepapers and source code, and trying to patent details of the -algorithm themselves just like Google did very recently [6] - -* [0] https://www.youtube.com/watch?v=7z6xjIRXcp4 -* [1] https://github.com/jbush001/NyuziProcessor/wiki -* [2] https://github.com/asicguy/gplgpu -* [3] https://github.com/jbush001/ChiselGPU/ -* [4] http://miaowgpu.org/ -* [5] https://en.wikipedia.org/wiki/Dirac_(video_compression_format) -* [6] https://yro.slashdot.org/story/18/06/11/2159218/inventor-says-google-is-patenting-his-public-domain-work -* [7] https://jbush001.github.io/2016/07/24/gplgpu-walkthrough.html -* [8] https://github.com/hermanhermitage/videocoreiv/wiki/VideoCore-IV-Programmers-Manual -* [9] libre-riscv.org/simple_v_extension/ -* [10] https://jbush001.github.io/2016/03/02/videocore-qpu-pipeline.html -* [11] https://jbush001.github.io/2016/02/27/life-of-triangle.html -* OpenPiton https://openpiton-blog.princeton.edu/2018/11/announcing-openpiton-with-ariane/ - - +# RISC-V 3D GPU / CPU / VPU + +Note: this is a **hybrid** CPU, VPU and GPU. It is not, as many news articles +are implying, a "dedicated exclusive GPU". The option exists to **create** +a stand-alone GPU product (contact us if this is a product that you want). +Our primary goal is to design a **complete** all-in-one processor +(System-on-a-Chip) that happens to include a libre-licensed VPU and GPU. + +We seek investors, sponsors, engineers and potential customers, who are +interested, as a first product, in the creation and use of an entirely +libre low-power mobile class system-on-a-chip. Comparative benchmark +performance, pincount and price is the Allwinner A64, except that the +power budget target is 2.5 watts in a 16x16mm 320 to 360 pin 0.8mm +FBGA package. Instead of single-issue higher clock rate, the design is +multi-issue, aiming for around 800mhz. + +The lower pincount, lower power, and higher BGA pitch is all to reduce +the cost of product development when it comes to PCB design and layout: + +* Above 4 watts requires metal packages, greater attention to thermal + management in the PCB design and layout, and much pricier PMICs. +* 0.6mm pitch BGA and below requires much more expensive PCB manufacturing + equipment and more costly PCBA techniques. +* Above 600 pins begins to reduce production yields as well as increase + the cost of testing and packaging. + +We can look at larger higher-power ASICs either later or, if funding +is made available, immediately. + +Recent applications to NLNet (Oct 2019) are for a test chip in 180nm, 64 bit, single core dual issue, around 300 to 350mhz. This will provide the confidence to go to higher geometries, as well as be a commercially viable embedded product in its own right. + +# Business Objectives + +* the project shall be a hybrid CPU-GPU because if it is not, the + complexity involved in developing a split shared-memory CPU-GPU both + at a hardware and a software level will be so costly it will jeapordise + the project. +* the project shall be commercial and mass-volume (100 million units + and above) +* the project shall be entirely transparent so that end-users will be + able to trust it +* the source code shall be available at all times for all components + for BUSINESS reasons, making development and use of SDKs dead simple + and aiding and assisting developers AND BUSINESSES in debugging and thus + hugely saving them money. + +# Links: + +* [[shakti/m_class/libre_3d_gpu]] +* [[discussion]] +* [[resources]] +* [[overview]] +* [[3d_gpu/funding]] +* Founding [[charter]] +* Mailing list +* Crowdsupply page +* Wiki +* Git repositories +* Bugtracker +* Kazan Vulkan Driver (including 3D engine) +* [NLNet 2019 Milestones](http://bugs.libre-riscv.org/buglist.cgi?columnlist=assigned_to%2Cbug_status%2Cresolution%2Cshort_desc%2Ccf_budget&f1=cf_nlnet_milestone&o1=equals&query_format=advanced&resolution=---&v1=NLnet.2019.02) +* NLNet Project Page +* [[nlnet_proposals]] + +# Progress: + +* Dec 2019: Second round NLNet questions answered. External Review completed. 6 NLNet proposals accepted (EUR 200,000+) +* Nov 2019: Alternative FP library to Berkeley softfloat developed. NLNet first round questions answered. +* Oct 2019: 3D Standards continued. POWER ISA considered. Open 3D Alliance begins. NLNet funding applications submitted. +* Sep 2019: 3D Standards continued. Additional NLNet Funding proposals discussed. +* Aug 2019: Development of "Transcendentals" (SIN/COS/ATAN2) Specifications +* Jul 2019: Sponsorship from Purism received. IEEE754 FP Mul, Add, DIV, + FCLASS and FCVT pipelines completed. +* Jun 2019: IEEE754 FP Mul, Add, and FSM "DIV" completed. +* May 2019: 6600-style scoreboard started +* Apr 2019: NLnet funding approved by independent review committee +* Mar 2019: NLnet funding application first and second phase passed +* Mar 2019: First successful nmigen pipeline milestone achieved with IEEE754 FADD +* Feb 2019: Conversion of John Dawson's IEEE754 FPU to nmigen started +* Jan 2019: Second version Simple-V preliminary proposal (suited to LLVM) +* 2017 - Nov 2018: Simple-V specification preliminary draft completed +* Aug 2018 - Nov 2018: spike-sv implementation of draft spec completed +* Aug 2018: Kazan Vulkan Driver initiated +* Sep 2018: mailing list established +* Sep 2018: Crowdsupply pre-launch page up (for updates) +* Dec 2018: preliminary floorplan and architecture designed (comp.arch) + +# News Articles + +* +* +* +* +* +* +* +* +* +* +* +* +* +* +* +* +* +* +* +* +* +* +* + +# Information Resources and Tutorials + +* +* +* +* +* +* +* +* +* +* +* - +* +* +* +* +* +* +* +* +* +* +* +* +* +* +* +* +* Fundamentals of Modern VLSI Devices + +# Analog Simulation + +* +* +* +* + +# Evaluations + +*[[openpower]]