Links:
* <http://www2.eng.cam.ac.uk/~dmh/4b7/resource/section14.htm>
+* <https://www10.edacafe.com/book/ASIC/CH02/CH02.7.php>
* <https://ftp.libre-soc.org/Pin_Control_Subsystem_Overview.pdf>
* <https://bugs.libre-soc.org/show_bug.cgi?id=50>
* <https://git.libre-soc.org/?p=c4m-jtag.git;a=tree;hb=HEAD>
+* Extra info: [[/docs/pinmux/temp_pinmux_info]]
Managing IO on an ASIC is nowhere near as simple as on an FPGA.
An FPGA has built-in IO Pads, the wires terminate inside an
In an ASIC, a bi-directional IO Pad requires three wires (in, out,
out-enable) to be routed right the way from the ASIC, all
the way to the IO PAD, where only then does a wire bond connect
-it to a single pin.
+it to a single external pin.
[[!img CH02-44.gif]]
cost but because far more of them fail due to having been
literally hit with a hammer many more times*)
-Yet, the expectation from the market is to be able to fit 1,000++
+Yet, the expectation from the market is to be able to fit 1,000+
pins worth of peripherals into only 200 to 400 worth of actual
IO Pads. The solution here: a GPIO Pinmux, described in some
detail here <https://ftp.libre-soc.org/Pin_Control_Subsystem_Overview.pdf>
an ASIC that combines **both** JTAG Boundary Scan **and** GPIO
Muxing, down to layout considerations using coriolis2.
+# Resources, Platforms and Pins
+
+When creating nmigen HDL as Modules, they typically know nothing about FPGA
+Boards or ASICs. They especially do not know anything about the
+Peripheral ICs (UART, I2C, USB, SPI, PCIe) connected to a given FPGA
+on a given PCB, and they should not have to.
+
+Through the Resources, Platforms and Pins API, a level of abstraction
+between peripherals, boards and HDL designs is provided. Peripherals
+may be given `(name, number)` tuples, the HDL design may "request"
+a peripheral, which is described in terms of Resources, managed
+by a ResourceManager, and a Platform may provide that peripheral.
+The Platform is given
+the resposibility to wire up the Pins to the correct FPGA (or ASIC)
+IO Pads, and it is the HDL design's responsibility to connect up
+those same named Pins, on the other side, to the implementation
+of the PHY/Controller, in the HDL.
+
+Here is a function that defines a UART Resource:
+
+ #!/usr/bin/env python3
+ from nmigen.build.dsl import Resource, Subsignal, Pins
+
+ def UARTResource(*args, rx, tx):
+ io = []
+ io.append(Subsignal("rx", Pins(rx, dir="i", assert_width=1)))
+ io.append(Subsignal("tx", Pins(tx, dir="o", assert_width=1)))
+ return Resource.family(*args, default_name="uart", ios=io)
+
+Note that the Subsignal is given a convenient name (tx, rx) and that
+there are Pins associated with it.
+UARTResource would typically be part of a larger function that defines,
+for either an FPGA or an ASIC, a full array of IO Connections:
+
+ def create_resources(pinset):
+ resources = []
+ resources.append(UARTResource('uart', 0, tx='A20', rx='A21'))
+ # add clock and reset
+ clk = Resource("clk", 0, Pins("sys_clk", dir="i"))
+ rst = Resource("rst", 0, Pins("sys_rst", dir="i"))
+ resources.append(clk)
+ resources.append(rst)
+ return resources
+
+For an FPGA, the Pins names are typically the Ball Grid Array
+Pad or Pin name: A12, or N20. ASICs can do likewise: it is
+for convenience when referring to schematics, to use the most
+recogniseable well-known name.
+
+Next, these Resources need to be handed to a ResourceManager or
+a Platform (Platform derives from ResourceManager)
+
+ from nmigen.build.plat import TemplatedPlatform
+
+ class ASICPlatform(TemplatedPlatform):
+ def __init__(self, resources):
+ super().__init__()
+ self.add_resources(resources)
+
+An HDL Module may now be created, which, if given
+a platform instance during elaboration, may request
+a UART (caveat below):
+
+ from nmigen import Elaboratable, Module, Signal
+
+ class Blinker(Elaboratable):
+ def elaborate(self, platform):
+ m = Module()
+ # get the UART resource, mess with the output tx
+ uart = platform.request('uart')
+ intermediary = Signal()
+ m.d.comb += uart.tx.eq(~intermediary) # invert, for fun
+ m.d.comb += intermediary.eq(uart.rx) # pass rx to tx
+
+ return m
+
+The caveat here is that the Resources of the platform actually
+have to have a UART in order for it to be requestable! Thus:
+
+ resources = create_resources() # contains resource named "uart"
+ asic = ASICPlatform(resources)
+ hdl = Blinker()
+ asic.build(hdl)
+
+Finally the association between HDL, Resources, and ASIC Platform
+is made:
+
+* The Resources contain the abstract expression of the
+type of peripheral, its port names, and the corresponding
+names of the IO Pads associated with each port.
+* The HDL which knows nothing about IO Pad names requests
+ a Resource by name
+* The ASIC Platform, given the list of Resources, takes care
+ of connecting requests for Resources to actual IO Pads.
+
+This is the simple version. When JTAG Boundary Scan needs
+to be added, it gets a lot more complex.
+
# JTAG Boundary Scan
JTAG Scanning is a (paywalled) IEEE Standard: 1149.1 which with
the same test can be run on a large batch of ASICs at the same
time.
-IO Pads come in four primary different types:
+IO Pads generally come in four primary different types:
* Input
* Output
* Output with Tristate (enable)
-* Bi-directional Input/Output with direction enable
+* Bi-directional Tristate Input/Output with direction enable
Interestingly these can all be synthesised from one
-Bi-directional IO Pad. Other features such as Differential
-Pairs may also be constructed from an inverter and a pair
+Bi-directional Tristate IO Pad. Other types such as Differential
+Pair Transmit may also be constructed from an inverter and a pair
of IO Pads. Other more advanced features include pull-up
and pull-down resistors, Schmidt triggering for interrupts,
different drive strengths, and so on, but the basics are
The JTAG Boundary Scan therefore needs to know what type
each pad is (In/Out/Bi) and has to "insert" itself in between
-the wires, which may be just an input, or just an output,
+*all* the Pad's wires, which may be just an input, or just an output,
and, if bi-directional, an "output enable" line.
The "insertion" (or, "Tap") into those wires requires a
pair of Muxes for each wire. Under normal operation
-the Muxes bypass JTAG entirely: the IO Pad is connected
+the Muxes bypass JTAG entirely: the IO Pad is connected,
+through the two Muxes,
directly to the Core (a hardware term for a "peripheral",
in Software terminology).
In this way, not only can JTAG control or read the IO Pad,
but it can also read or control the Core (peripheral).
-This is its entire purpose: to allow for the detection
+This is its entire purpose: interception to allow for the detection
and triaging of faults.
* Software may be uploaded and run which sets a bit on
<img src="https://libre-soc.org/shakti/m_class/JTAG/jtag-block.jpg"
width=500 />
+## C4M JTAG TAP
+
+Staf Verhaegen's Chips4Makers JTAG TAP module includes everything
+needed to create JTAG Boundary Scan Shift Registers,
+as well as the IEEE 1149.1 Finite State Machine to access
+them through TMS, TDO, TDI and TCK Signalling. However,
+connecting up cores (a hardware term: the equivalent software
+term is "peripherals") on one side and the pads on the other is
+especially confusing, but deceptively simple. The actual addition
+to the Scan Shift Register is this straightforward:
+
+ from c4m.nmigen.jtag.tap import IOType, TAP
+
+ class JTAG(TAP):
+ def __init__(self):
+ TAP.__init__(self, ir_width=4)
+ self.u_tx = self.add_io(iotype=IOType.Out, name="tx")
+ self.u_rx = self.add_io(iotype=IOType.In, name="rx")
+
+This results in the creation of:
+
+* Two Records, one of type In named rx, the other an output
+ named tx
+* Each Record contains a pair of sub-Records: one core-side
+ and the other pad-side
+* Entries in the Boundary Scan Shift Register which if set
+ may control (or read) either the peripheral / core or
+ the IO PAD
+* A suite of Muxes (as shown in the diagrams above) which
+ allow either direct connection between pad and core
+ (bypassing JTAG) or interception
+
+During Interception Mode (Scanning) pad and core are connected
+to the Shift Register. During "Production" Mode, pad and
+core are wired directly to each other (on a per-pin basis,
+for every pin. Clearly this is a lot of work).
+
+It is then your responsibility to:
+
+* connect up each and every peripheral input and output
+ to the right IO Core Record in your HDL
+* connect up each and every IO Pad input and output
+ to the right IO Pad in the Platform. **This
+ does not happen automatically and is not the
+ responsibility of the TAP Interface*
+
+The TAP interface connects the **other** side of the pads
+and cores Records: **to the Muxes**. You **have** to
+connect **your** side of both core and pads Records in
+order for the Scan to be fully functional.
+
+Both of these tasks are painstaking and tedious in the
+extreme if done manually, and prone to either sheer boredom,
+transliteration errors, dyslexia triggering or just utter
+confusion. Despite this, let us proceed, and, augmenting
+the Blinky example, wire up a JTAG instance:
+
+ class Blinker(Elaboratable):
+ def elaborate(self, platform):
+ m = Module()
+ m.submodules.jtag = jtag = JTAG()
+
+ # get the records from JTAG instance
+ utx, urx = jtag.u_tx, jtag.u_rx
+ # get the UART resource, mess with the output tx
+ p_uart = platform.request('uart')
+
+ # uart core-side from JTAG
+ intermediary = Signal()
+ m.d.comb += utx.core.o.eq(~intermediary) # invert, for fun
+ m.d.comb += intermediary.eq(urx.core.i) # pass rx to tx
+
+ # wire up the IO Pads (in right direction) to Platform
+ m.d.comb += uart.rx.eq(utx.pad.i) # receive rx from JTAG input pad
+ m.d.comb += utx.pad.o.eq(uart.tx) # transmit tx to JTAG output pad
+ return m
+
+Compared to the non-scan-capable version, which connected UART
+Core Tx and Rx directly to the Platform Resource (and the Platform
+took care of wiring to IO Pads):
+
+* Core HDL is instead wired to the core-side of JTAG Scan
+* JTAG Pad side is instead wired to the Platform
+* (the Platform still takes care of wiring to actual IO Pads)
+
+JTAG TAP capability on UART TX and RX has now been inserted into
+the chain. Using openocd or other program it is possible to
+send TDI, TMS, TDO and TCK signals according to IEEE 1149.1 in order
+to intercept both the core and IO Pads, both input and output,
+and confirm the correct functionality of one even if the other is
+broken, during ASIC testing.
+
+## Libre-SOC Automatic Boundary Scan
+
+Libre-SOC's JTAG TAP Boundary Scan system is a little more sophisticated:
+it hooks into (replaces) ResourceManager.request(), intercepting the request
+and recording what was requested. The above manual linkup to JTAG TAP
+is then taken care of **automatically and transparently**, but to
+all intents and purposes looking exactly like a Platform even to
+the extent of taking the exact same list of Resources.
+
+ class Blinker(Elaboratable):
+ def __init__(self, resources):
+ self.jtag = JTAG(resources)
+
+ def elaborate(self, platform):
+ m = Module()
+ m.submodules.jtag = jtag = self.jtag
+
+ # get the UART resource, mess with the output tx
+ uart = jtag.request('uart')
+ intermediary = Signal()
+ m.d.comb += uart.tx.eq(~intermediary) # invert, for fun
+ m.d.comb += intermediary.eq(uart.rx) # pass rx to tx
+
+ return jtag.boundary_elaborate(m, platform)
+
+Connecting up and building the ASIC is as simple as a non-JTAG,
+non-scanning-aware Platform:
+
+ resources = create_resources()
+ asic = ASICPlatform(resources)
+ hdl = Blinker(resources)
+ asic.build(hdl)
+
+The differences:
+
+* The list of resources was also passed to the HDL Module
+ such that JTAG may create a complete identical list
+ of both core and pad matching Pins
+* Resources were requested from the JTAG instance,
+ not the Platform
+* A "magic function" (JTAG.boundary_elaborate) is called
+ which wires up all of the seamlessly intercepted
+ Platform resources to the JTAG core/pads Resources,
+ where the HDL connected to the core side, exactly
+ as if this was a non-JTAG-Scan-aware Platform.
+* ASICPlatform still takes care of connecting to actual
+ IO Pads, except that the Platform.resource requests were
+ triggered "behind the scenes". For that to work it
+ is absolutely essential that the JTAG instance and the
+ ASICPlatform be given the exact same list of Resources.
+
+
## Clock synchronisation
+Take for example USB ULPI:
+
+<img src="https://www.crifan.com/files/pic/serial_story/other_site/p_blog_bb.JPG"
+width=400 />
+
+Here there is an external incoming clock, generated by the PHY, to which
+both Received *and Transmitted* data and control is synchronised. Notice
+very specifically that it is *not the main processor* generating that clock
+Signal, but the external peripheral (known as a PHY in Hardware terminology)
+
+Firstly: note that the Clock will, obviously, also need to be routed
+through JTAG Boundary Scan, because, after all, it is being received
+through just another ordinary IO Pad, after all. Secondly: note thst
+if it didn't, then clock skew would occur for that peripheral because
+although the Data Wires went through JTAG Boundary Scan MUXes, the
+clock did not. Clearly this would be a problem.
+
+However, clocks are very special signals: they have to be distributed
+evenly to all and any Latches (DFFs) inside the peripheral so that
+data corruption does not occur because of tiny delays.
+To avoid that scenario, Clock Domain Crossing (CDC) is used, with
+Asynchronous FIFOs:
+
+ rx_fifo = stream.AsyncFIFO([("data", 8)], self.rx_depth, w_domain="ulpi", r_domain="sync")
+ tx_fifo = stream.AsyncFIFO([("data", 8)], self.tx_depth, w_domain="sync", r_domain="ulpi")
+ m.submodules.rx_fifo = rx_fifo
+ m.submodules.tx_fifo = tx_fifo
+
+However the entire FIFO must be covered by two Clock H-Trees: one
+by the ULPI external clock, and the other the main system clock.
+The size of the ULPI clock H-Tree, and consequently the size of
+the PHY on-chip, will result in more Clock Tree Buffers being
+inserted into the chain, and, correspondingly, matching buffers
+on the ULPI data input side likewise must be inserted so that
+the input data timing precisely matches that of its clock.
+
+The problem is not receiving of data, though: it is transmission
+on the output ULPI side. With the ULPI Clock Tree having buffers
+inserted, each buffer creates delay. The ULPI output FIFO has to
+correspondingly be synchronised not to the original incoming clock
+but to that clock *after going through H Tree Buffers*. Therefore,
+there will be a lag on the output data compared to the incoming
+(external) clock
+
+# Pinmux GPIO Block
+The following diagram is an example of a GPIO block with switchable banks and comes from the Ericson presentation on a GPIO architecture.
+[[!img gpio_block.png size="600x"]]
+
+The block we are developing is very similar, but is lacking some of configuration of the former (due to complexity and time constraints).
+
+## Diagram
+[[!img banked_gpio_block.jpg size="600x"]]
+
+*(Diagram is missing the "ie" signal as part of the bundle of signals given to the peripherals, will be updated later)*
+
+## Explanation
+The simple GPIO module is multi-GPIO block integral to the pinmux system.
+To make the block flexible, it has a variable number of of I/Os based on an
+input parameter.
+
+By default, the block is memory-mapped WB bus GPIO. The CPU
+core can just write the configuration word to the GPIO row address. From this
+perspective, it is no different to a conventional GPIO block.
+
+### Bank Select Options
+* bank 0 - WB bus has full control (GPIO peripheral)
+* bank 1,2,3 - WB bus only controls puen/pden, periphal gets o/oe/i/ie (Not
+fully specified how this should be arranged yet)
+
+Bank select however, allows to switch over the control of the GPIO block to
+another peripheral. The peripheral will be given sole connectivity to the
+o/oe/i/ie signals, while additional parameters such as pull up/down will either
+be automatically configured (as the case for I2C), or will be configurable
+via the WB bus. *(This has not been implemented yet, so open to discussion)*
+
+## Configuration Word
+After a discussion with Luke on IRC (14th January 2022), new layout of the
+8-bit data word for configuring the GPIO (through CSR):
+
+* oe - Output Enable (see the Ericson presentation for the GPIO diagram)
+* ie - Input Enable
+* puen - Pull-Up resistor enable
+* pden - Pull-Down resistor enable
+* i/o - When configured as output (oe set), this bit sets/clears output. When
+configured as input, shows the current state of input (read-only)
+* bank_sel[2:0] - Bank Select (only 4 banks used)
+
+### Simultaneous/Packed Configuration
+To make the configuration more efficient, multiple GPIOs can be configured with
+one data word. The number of GPIOs in one "row" is dependent on the width of the
+WB data bus.
+
+If for example, the data bus is 64-bits wide, eight GPIO configuration bytes -
+and thus eight GPIOs - are configured in one go. There is no way to specify
+which GPIO in a row is configured, so the programmer has to keep the current
+state of the configuration as part of the code (essentially a shadow register).
+
+The diagram below shows the layout of the configuration byte, and how it fits
+within a 64-bit data word.
+
+[[!img gpio_csr_example.jpg size="600x"]]
+
+If the block is created with more GPIOs than can fit in a single data word,
+the next set of GPIOs can be accessed by incrementing the address.
+For example, if 16 GPIOs are instantiated and 64-bit data bus is used, GPIOs
+0-7 are accessed via address 0, whereas GPIOs 8-15 are accessed by address 8
+(TODO: DOES ADDRESS COUNT WORDS OR BYTES?)
+
+## Example Memory Map
+[[!img gpio_memory_example.jpg size="600x"]]
+
+The diagrams above show the difference in memory layout between 16-GPIO block
+implemented with 64-bit and 32-bit WB data buses.
+The 64-bit case shows there are two rows with eight GPIOs in each, and it will
+take two writes (assuming simple WB write) to completely configure all 16 GPIOs.
+The 32-bit on the other hand has four address rows, and so will take four write transactions.
+
+64-bit:
+
+* 0x00 - Configure GPIOs 0-7
+* 0x01 - Configure GPIOs 8-15
+
+32-bit:
+
+* 0x00 - Configure GPIOs 0-3
+* 0x01 - Configure GPIOs 4-7
+* 0x02 - Configure GPIOs 8-11
+* 0x03 - Configure GPIOs 12-15
+
+
+## Combining JTAG BS Chain and Pinmux (In Progress)
+[[!img io_mux_bank_planning.JPG size="600x"]]
+
+The JTAG BS chain need to have access to the bank select bits, to allow
+selecting different peripherals during testing. At the same time, JTAG may
+also require access to the WB bus to access GPIO configuration options
+not available to bank 1/2/3 peripherals.
+
+### Proposal
+TODO: REWORK BASED ON GPIO JTAG DIAGRAMS BELOW
+The proposed JTAG BS chain is as follows:
+
+* Between each peripheral and GPIO block, add a JTAG BS chain. For example
+the I2C SDA line will have core o/oe/i/ie, and from JTAG the pad o/oe/i/ie will
+connect to the GPIO block's ports 1-3.
+* Provide a test port for the GPIO block that gives full access to configuration
+(o/oe/i/ie/puen/pden) and bank select. Only allow full JTAG configuration *IF*
+ban select bit 2 is set!
+* No JTAG chain between WB bus and GPIO port 0 input *(not sure what to do for
+this, or whether it is even needed)*.
+
+Such a setup would allow the JTAG chain to control the bank select when testing
+connectivity of the peripherals, as well as give full control to the GPIO
+configuration when bank select bit 2 is set.
+
+For the purposes of muxing peripherals, bank select bit 2 is ignored. This means
+that even if JTAG is handed over full control, the peripheral is still connected
+to the GPIO block (via the BS chain).
+
+Signals for various ports:
+
+* WB bus or Periph0: WB data read, data write, address, cyc, stb, ack
+* Periph1/2/3: o,oe,i,ie (puen/pden are only controlled by WB, test port, or
+fixed by functionality)
+* Test port: bank_select[2:0], o,oe,i,ie,puen,pden. In addition, internal
+address to access individual GPIOs will be available (this will consist of a
+few bits, as more than 16 GPIOs per block is likely to be to big).
+
+As you can see by the above list, the GPIO block is becoming quite a complex
+beast. If there are suggestions to simplify or reduce some of the signals,
+that will be helpful.*
+
+The diagrams below show 1-bit GPIO connectivity, as well as the 4-bit case.
+
+[[!img gpio_jtag_1bit.jpg size="600x"]]
+
+[[!img gpio_jtag_4bit.jpg size="600x"]]
+
+# Core/Pad Connection + JTAG Mux
+Diagram constructed from the nmigen plat.py file.
-# GPIO Muxing
+[[!img i_o_io_tristate_jtag.JPG size="600x"]]
-[[!img gpio_block.png]]