X-Git-Url: https://git.libre-soc.org/?a=blobdiff_plain;f=3d_gpu%2Flayouts%2Fcoriolis2_180nm.mdwn;h=c6f67ec9af54b7ce3679a69076db54ca79f5850a;hb=dfcf992cb61bf0de42700ba361d934659ab06ca1;hp=c58a3b785aa2c46a6886bdd2a1d0a400323037a1;hpb=b17537b2f0e7746c34444138ea226e10e415cec4;p=libreriscv.git diff --git a/3d_gpu/layouts/coriolis2_180nm.mdwn b/3d_gpu/layouts/coriolis2_180nm.mdwn index c58a3b785..c6f67ec9a 100644 --- a/3d_gpu/layouts/coriolis2_180nm.mdwn +++ b/3d_gpu/layouts/coriolis2_180nm.mdwn @@ -1,5 +1,121 @@ # Coriolis2 180nm layout -TODO +* - toplevel +* - main layout +* - this page +* +* [[180nm_Oct2020]] -* +# Simple floorplan + +[[!img simple_floorplan.png size="500x"]] + +## Register files + +There are 6 register files: STATE, SPR, INT, CR, XER and FAST. + +Access to each of the ports is managed via a "Priority Picker" - an +unary-in but one-hot unary-out picker - which allows one and only one +"user" of a given regfile port at any one time. + +## Computation Units + +There are 8 Function Units: ALU, Logical, Condition, Branch, ShiftRot, LDST, +Trap, and SPRs. + +Each Function Unit has operand inputs and operand outputs. Across *all* +pipelines there are multiple Function Units that require "RA" (Register A +Integer Register File). All of such "RA" read requests are (surprise) +connected to the same "Priority Picker" mentioned above: likewise +all Function Units requiring write to the "RT" register are connected +to the exact same "RT-managing" Write Priority Picker. + +### Load Store Computation Unit(s) + +Load/Store is a special type of Computation Unit that additionally has +access to external memory. In the case where multiple LDSTCompUnits +are added, L0CacheBuffer is responsible for "merging" these into single +requests. + +There are however *two* L0 Caches (both 128-bit wide), with a split +on address bit 4 for selecting either the odd L0 Cache or the even L0 Cache. + +Each of the two L0 caches has dual 64-bit Wishbone interfaces giving +a total of *four* 64-bit Memory Bus requests that will be merged through +an Arbiter down onto the same Memory Bus that the I-Cache is also connected +to. + +## Instructions + +Instructions are decoded by PowerDecoder2, after being read by the +simple core FSM from the Instruction Cache. Currently this is an +extremely simple memory block, to be replaced by a proper I-Cache +with a proper connection to the Memory Bus (wishbone). + +# IO Ring and JTAG + +[[!img 180nm_Oct2020/ls180.svg size="500x" ]] + +The IO Ring is autogenerated from the same pinmux program +that created the [[180nm_Oct2020/pinouts]] and the SVG +image. The image was used by Greatek for packaging as well as +a PCB designed by Professor Galayko of Sorbonne University. + +The exact same pinmux program's output, specifying all interfaces, +was also used to autogenerate the HDL for the JTAG Boundary Scan. + +By strictly using the exact same *machine readable* specification +for all Interfaces using only autogenerated techniques it was possible +to ensure complete consistency across + +* Markdown file +* SVG Image for packaging +* IO Ring +* JTAG Boundary Scan + +JTAG also contains a Wishbone Master for direct access to Memory +and also a DMI Interface for controlling the core. In simulations +a JTAG client was implemented both in nmigen HDL as well as +verilator. The exact same openocd scripts or direct +JTAG connectivity using jtagremote can then be used on: + +* nmigen HDL simulations +* verilator simulations +* [[HDL_workflow/ECP5_FPGA]] +* the actual ls180 ASIC + + + +# Building + +To build see [[HDL_workflow/coriolis2]]. A tag has been used and the +build instructions specify it. The soclayout repository is standalone, +containing a snapshot of the verilog autogenerated output. + +# About coriolis2 + +There are several talks online now. + +* [[conferences/fosdem2022]] +* + +Jean-Paul Chaput of LIP6 carried out several improvements to coriolis2 +in order for it to cope with an 800,000 transistor 30 mm^2 180nm layout. +These included: + +* automatic antennae diodes (needed for stopping ESD), +* clock tree improvements +* Dual Power rings (Core, IO) +* Automatic buffer insertion (clock tree synchronised) +* High fanout buffers (1 to 128) and repeater buffers + +Overall it was a significant amount of work and it is entirely +automated `RTL2GDS`, no manual intervention required. + + + +coriolis2 converts verilog to BLIF using yosys and the Cell Library, then converts +BLIF into a VHDL subset. This subset is extremely simple, comprising +links (netlists) to cells and nothing more. It can be extracted and +converted to actual VHDL and substituted successfully into verilator, +ghdl or icarus simulations using cocotb (caveat: the files are enormous).