Links:
* <http://www2.eng.cam.ac.uk/~dmh/4b7/resource/section14.htm>
+* <https://www10.edacafe.com/book/ASIC/CH02/CH02.7.php>
* <https://ftp.libre-soc.org/Pin_Control_Subsystem_Overview.pdf>
* <https://bugs.libre-soc.org/show_bug.cgi?id=50>
* <https://git.libre-soc.org/?p=c4m-jtag.git;a=tree;hb=HEAD>
+* Extra info: [[/docs/pinmux/temp_pinmux_info]]
Managing IO on an ASIC is nowhere near as simple as on an FPGA.
An FPGA has built-in IO Pads, the wires terminate inside an
In an ASIC, a bi-directional IO Pad requires three wires (in, out,
out-enable) to be routed right the way from the ASIC, all
the way to the IO PAD, where only then does a wire bond connect
-it to a single pin.
+it to a single external pin.
[[!img CH02-44.gif]]
cost but because far more of them fail due to having been
literally hit with a hammer many more times*)
-Yet, the expectation from the market is to be able to fit 1,000++
+Yet, the expectation from the market is to be able to fit 1,000+
pins worth of peripherals into only 200 to 400 worth of actual
IO Pads. The solution here: a GPIO Pinmux, described in some
detail here <https://ftp.libre-soc.org/Pin_Control_Subsystem_Overview.pdf>
an ASIC that combines **both** JTAG Boundary Scan **and** GPIO
Muxing, down to layout considerations using coriolis2.
+# Resources, Platforms and Pins
+
+When creating nmigen HDL as Modules, they typically know nothing about FPGA
+Boards or ASICs. They especially do not know anything about the
+Peripheral ICs (UART, I2C, USB, SPI, PCIe) connected to a given FPGA
+on a given PCB, and they should not have to.
+
+Through the Resources, Platforms and Pins API, a level of abstraction
+between peripherals, boards and HDL designs is provided. Peripherals
+may be given `(name, number)` tuples, the HDL design may "request"
+a peripheral, which is described in terms of Resources, managed
+by a ResourceManager, and a Platform may provide that peripheral.
+The Platform is given
+the resposibility to wire up the Pins to the correct FPGA (or ASIC)
+IO Pads, and it is the HDL design's responsibility to connect up
+those same named Pins, on the other side, to the implementation
+of the PHY/Controller, in the HDL.
+
+Here is a function that defines a UART Resource:
+
+ #!/usr/bin/env python3
+ from nmigen.build.dsl import Resource, Subsignal, Pins
+
+ def UARTResource(*args, rx, tx):
+ io = []
+ io.append(Subsignal("rx", Pins(rx, dir="i", assert_width=1)))
+ io.append(Subsignal("tx", Pins(tx, dir="o", assert_width=1)))
+ return Resource.family(*args, default_name="uart", ios=io)
+
+Note that the Subsignal is given a convenient name (tx, rx) and that
+there are Pins associated with it.
+UARTResource would typically be part of a larger function that defines,
+for either an FPGA or an ASIC, a full array of IO Connections:
+
+ def create_resources(pinset):
+ resources = []
+ resources.append(UARTResource('uart', 0, tx='A20', rx='A21'))
+ # add clock and reset
+ clk = Resource("clk", 0, Pins("sys_clk", dir="i"))
+ rst = Resource("rst", 0, Pins("sys_rst", dir="i"))
+ resources.append(clk)
+ resources.append(rst)
+ return resources
+
+For an FPGA, the Pins names are typically the Ball Grid Array
+Pad or Pin name: A12, or N20. ASICs can do likewise: it is
+for convenience when referring to schematics, to use the most
+recogniseable well-known name.
+
+Next, these Resources need to be handed to a ResourceManager or
+a Platform (Platform derives from ResourceManager)
+
+ from nmigen.build.plat import TemplatedPlatform
+
+ class ASICPlatform(TemplatedPlatform):
+ def __init__(self, resources):
+ super().__init__()
+ self.add_resources(resources)
+
+An HDL Module may now be created, which, if given
+a platform instance during elaboration, may request
+a UART (caveat below):
+
+ from nmigen import Elaboratable, Module, Signal
+
+ class Blinker(Elaboratable):
+ def elaborate(self, platform):
+ m = Module()
+ # get the UART resource, mess with the output tx
+ uart = platform.request('uart')
+ intermediary = Signal()
+ m.d.comb += uart.tx.eq(~intermediary) # invert, for fun
+ m.d.comb += intermediary.eq(uart.rx) # pass rx to tx
+
+ return m
+
+The caveat here is that the Resources of the platform actually
+have to have a UART in order for it to be requestable! Thus:
+
+ resources = create_resources() # contains resource named "uart"
+ asic = ASICPlatform(resources)
+ hdl = Blinker()
+ asic.build(hdl)
+
+Finally the association between HDL, Resources, and ASIC Platform
+is made:
+
+* The Resources contain the abstract expression of the
+type of peripheral, its port names, and the corresponding
+names of the IO Pads associated with each port.
+* The HDL which knows nothing about IO Pad names requests
+ a Resource by name
+* The ASIC Platform, given the list of Resources, takes care
+ of connecting requests for Resources to actual IO Pads.
+
+This is the simple version. When JTAG Boundary Scan needs
+to be added, it gets a lot more complex.
+
# JTAG Boundary Scan
JTAG Scanning is a (paywalled) IEEE Standard: 1149.1 which with
the same test can be run on a large batch of ASICs at the same
time.
-IO Pads come in four primary different types:
+IO Pads generally come in four primary different types:
* Input
* Output
* Output with Tristate (enable)
-* Bi-directional Input/Output with direction enable
+* Bi-directional Tristate Input/Output with direction enable
Interestingly these can all be synthesised from one
-Bi-directional IO Pad. Other features such as Differential
-Pairs may also be constructed from an inverter and a pair
+Bi-directional Tristate IO Pad. Other types such as Differential
+Pair Transmit may also be constructed from an inverter and a pair
of IO Pads. Other more advanced features include pull-up
and pull-down resistors, Schmidt triggering for interrupts,
different drive strengths, and so on, but the basics are
The JTAG Boundary Scan therefore needs to know what type
each pad is (In/Out/Bi) and has to "insert" itself in between
-the wires, which may be just an input, or just an output,
+*all* the Pad's wires, which may be just an input, or just an output,
and, if bi-directional, an "output enable" line.
The "insertion" (or, "Tap") into those wires requires a
pair of Muxes for each wire. Under normal operation
-the Muxes bypass JTAG entirely: the IO Pad is connected
+the Muxes bypass JTAG entirely: the IO Pad is connected,
+through the two Muxes,
directly to the Core (a hardware term for a "peripheral",
in Software terminology).
In this way, not only can JTAG control or read the IO Pad,
but it can also read or control the Core (peripheral).
-This is its entire purpose: to allow for the detection
+This is its entire purpose: interception to allow for the detection
and triaging of faults.
* Software may be uploaded and run which sets a bit on
<img src="https://libre-soc.org/shakti/m_class/JTAG/jtag-block.jpg"
width=500 />
+## C4M JTAG TAP
+
+Staf Verhaegen's Chips4Makers JTAG TAP module includes everything
+needed to create JTAG Boundary Scan Shift Registers,
+as well as the IEEE 1149.1 Finite State Machine to access
+them through TMS, TDO, TDI and TCK Signalling. However,
+connecting up cores (a hardware term: the equivalent software
+term is "peripherals") on one side and the pads on the other is
+especially confusing, but deceptively simple. The actual addition
+to the Scan Shift Register is this straightforward:
+
+ from c4m.nmigen.jtag.tap import IOType, TAP
+
+ class JTAG(TAP):
+ def __init__(self):
+ TAP.__init__(self, ir_width=4)
+ self.u_tx = self.add_io(iotype=IOType.Out, name="tx")
+ self.u_rx = self.add_io(iotype=IOType.In, name="rx")
+
+This results in the creation of:
+
+* Two Records, one of type In named rx, the other an output
+ named tx
+* Each Record contains a pair of sub-Records: one core-side
+ and the other pad-side
+* Entries in the Boundary Scan Shift Register which if set
+ may control (or read) either the peripheral / core or
+ the IO PAD
+* A suite of Muxes (as shown in the diagrams above) which
+ allow either direct connection between pad and core
+ (bypassing JTAG) or interception
+
+During Interception Mode (Scanning) pad and core are connected
+to the Shift Register. During "Production" Mode, pad and
+core are wired directly to each other (on a per-pin basis,
+for every pin. Clearly this is a lot of work).
+
+It is then your responsibility to:
+
+* connect up each and every peripheral input and output
+ to the right IO Core Record in your HDL
+* connect up each and every IO Pad input and output
+ to the right IO Pad in the Platform. **This
+ does not happen automatically and is not the
+ responsibility of the TAP Interface*
+
+The TAP interface connects the **other** side of the pads
+and cores Records: **to the Muxes**. You **have** to
+connect **your** side of both core and pads Records in
+order for the Scan to be fully functional.
+
+Both of these tasks are painstaking and tedious in the
+extreme if done manually, and prone to either sheer boredom,
+transliteration errors, dyslexia triggering or just utter
+confusion. Despite this, let us proceed, and, augmenting
+the Blinky example, wire up a JTAG instance:
+
+ class Blinker(Elaboratable):
+ def elaborate(self, platform):
+ m = Module()
+ m.submodules.jtag = jtag = JTAG()
+
+ # get the records from JTAG instance
+ utx, urx = jtag.u_tx, jtag.u_rx
+ # get the UART resource, mess with the output tx
+ p_uart = platform.request('uart')
+
+ # uart core-side from JTAG
+ intermediary = Signal()
+ m.d.comb += utx.core.o.eq(~intermediary) # invert, for fun
+ m.d.comb += intermediary.eq(urx.core.i) # pass rx to tx
+
+ # wire up the IO Pads (in right direction) to Platform
+ m.d.comb += uart.rx.eq(utx.pad.i) # receive rx from JTAG input pad
+ m.d.comb += utx.pad.o.eq(uart.tx) # transmit tx to JTAG output pad
+ return m
+
+Compared to the non-scan-capable version, which connected UART
+Core Tx and Rx directly to the Platform Resource (and the Platform
+took care of wiring to IO Pads):
+
+* Core HDL is instead wired to the core-side of JTAG Scan
+* JTAG Pad side is instead wired to the Platform
+* (the Platform still takes care of wiring to actual IO Pads)
+
+JTAG TAP capability on UART TX and RX has now been inserted into
+the chain. Using openocd or other program it is possible to
+send TDI, TMS, TDO and TCK signals according to IEEE 1149.1 in order
+to intercept both the core and IO Pads, both input and output,
+and confirm the correct functionality of one even if the other is
+broken, during ASIC testing.
+
+## Libre-SOC Automatic Boundary Scan
+
+Libre-SOC's JTAG TAP Boundary Scan system is a little more sophisticated:
+it hooks into (replaces) ResourceManager.request(), intercepting the request
+and recording what was requested. The above manual linkup to JTAG TAP
+is then taken care of **automatically and transparently**, but to
+all intents and purposes looking exactly like a Platform even to
+the extent of taking the exact same list of Resources.
+
+ class Blinker(Elaboratable):
+ def __init__(self, resources):
+ self.jtag = JTAG(resources)
+
+ def elaborate(self, platform):
+ m = Module()
+ m.submodules.jtag = jtag = self.jtag
+
+ # get the UART resource, mess with the output tx
+ uart = jtag.request('uart')
+ intermediary = Signal()
+ m.d.comb += uart.tx.eq(~intermediary) # invert, for fun
+ m.d.comb += intermediary.eq(uart.rx) # pass rx to tx
+
+ return jtag.boundary_elaborate(m, platform)
+
+Connecting up and building the ASIC is as simple as a non-JTAG,
+non-scanning-aware Platform:
+
+ resources = create_resources()
+ asic = ASICPlatform(resources)
+ hdl = Blinker(resources)
+ asic.build(hdl)
+
+The differences:
+
+* The list of resources was also passed to the HDL Module
+ such that JTAG may create a complete identical list
+ of both core and pad matching Pins
+* Resources were requested from the JTAG instance,
+ not the Platform
+* A "magic function" (JTAG.boundary_elaborate) is called
+ which wires up all of the seamlessly intercepted
+ Platform resources to the JTAG core/pads Resources,
+ where the HDL connected to the core side, exactly
+ as if this was a non-JTAG-Scan-aware Platform.
+* ASICPlatform still takes care of connecting to actual
+ IO Pads, except that the Platform.resource requests were
+ triggered "behind the scenes". For that to work it
+ is absolutely essential that the JTAG instance and the
+ ASICPlatform be given the exact same list of Resources.
+
+
+## Clock synchronisation
+
+Take for example USB ULPI:
+
+<img src="https://www.crifan.com/files/pic/serial_story/other_site/p_blog_bb.JPG"
+width=400 />
+
+Here there is an external incoming clock, generated by the PHY, to which
+both Received *and Transmitted* data and control is synchronised. Notice
+very specifically that it is *not the main processor* generating that clock
+Signal, but the external peripheral (known as a PHY in Hardware terminology)
+
+Firstly: note that the Clock will, obviously, also need to be routed
+through JTAG Boundary Scan, because, after all, it is being received
+through just another ordinary IO Pad, after all. Secondly: note thst
+if it didn't, then clock skew would occur for that peripheral because
+although the Data Wires went through JTAG Boundary Scan MUXes, the
+clock did not. Clearly this would be a problem.
+
+However, clocks are very special signals: they have to be distributed
+evenly to all and any Latches (DFFs) inside the peripheral so that
+data corruption does not occur because of tiny delays.
+To avoid that scenario, Clock Domain Crossing (CDC) is used, with
+Asynchronous FIFOs:
+
+ rx_fifo = stream.AsyncFIFO([("data", 8)], self.rx_depth, w_domain="ulpi", r_domain="sync")
+ tx_fifo = stream.AsyncFIFO([("data", 8)], self.tx_depth, w_domain="sync", r_domain="ulpi")
+ m.submodules.rx_fifo = rx_fifo
+ m.submodules.tx_fifo = tx_fifo
+
+However the entire FIFO must be covered by two Clock H-Trees: one
+by the ULPI external clock, and the other the main system clock.
+The size of the ULPI clock H-Tree, and consequently the size of
+the PHY on-chip, will result in more Clock Tree Buffers being
+inserted into the chain, and, correspondingly, matching buffers
+on the ULPI data input side likewise must be inserted so that
+the input data timing precisely matches that of its clock.
+
+The problem is not receiving of data, though: it is transmission
+on the output ULPI side. With the ULPI Clock Tree having buffers
+inserted, each buffer creates delay. The ULPI output FIFO has to
+correspondingly be synchronised not to the original incoming clock
+but to that clock *after going through H Tree Buffers*. Therefore,
+there will be a lag on the output data compared to the incoming
+(external) clock
+
+# GPIO Muxing
+
[[!img gpio_block.png]]
+
+[[!img io_mux_bank_planning.JPG]]
+
+# Core/Pad Connection + JTAG Mux
+
+Diagram constructed from the nmigen plat.py file.
+
+[[!img i_o_io_tristate_jtag.JPG]]
+